public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk)
@ 2022-11-17 16:37 Andrea Corallo
  2022-11-17 16:37 ` [PATCH 01/35] arm: improve vcreateq* tests Andrea Corallo
                   ` (35 more replies)
  0 siblings, 36 replies; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

Hi all,

this is the first patch series about improving the current MVE
implementation and testsuite for:

- Complete intrinsic implementation and coverage (the list of intrinsics is
  specified by [1])
- Verifying all instructions supposedly emitted by each intrinsic
- Verifying register usage
- Fixing the current scan assemblers to really match the wanted mnemonics
- Verifying no external calls are emitted

This series fixes the backend where necessary.

Best Regards

  Andrea

Andrea Corallo (31):
  arm: improve vcreateq* tests
  arm: fix 'vmsr' spacing and register capitalization
  arm: improve tests and fix vddupq*
  arm: improve tests and fix vdwdupq*
  arm: improve vidupq* tests
  arm: improve tests and fix vdupq*
  arm: improve tests and fix vcmp*
  arm: improve tests for vmin*
  arm: improve tests for vmax*
  arm: improve tests for vabavq*
  arm: improve tests for vabdq*
  arm: improve tests and fix vabsq*
  arm: improve tests and fix vadd*
  arm: improve tests for vmulq*
  arm: improve tests and fix vsubq*
  arm: improve tests for vfmasq_m*
  arm: improve tests for vhaddq_m*
  arm: improve tests for vhsubq_m*
  arm: improve tests for viwdupq*
  arm: improve tests for vmladavaq*
  arm: improve tests and fix vmlaldavaxq*
  arm: improve tests for vmlasq*
  arm: improve tests for vqaddq_m*
  arm: improve tests for vqdmlahq_m*
  arm: improve tests for vqdmul*
  arm: improve tests for vqrdmlahq*
  arm: improve tests for vqrdmlashq_m*
  arm: improve tests for vqsubq*
  arm: improve tests and fix vrmlaldavhaq*
  arm: improve tests for vrshlq*
  arm: improve tests for vsetq_lane*

Stam Markianos-Wright (4):
  arm: further fix overloading of MVE vaddq[_m]_n intrinsic
  arm: propagate fixed overloading of MVE intrinsic scalar parameters
  arm: Explicitly specify other float types for _Generic overloading
    [PR107515]
  arm: Add integer vector overloading of vsubq_x instrinsic

 gcc/config/arm/arm_mve.h                      | 1232 +++++++++--------
 gcc/config/arm/mve.md                         |   48 +-
 gcc/config/arm/vfp.md                         |    8 +-
 .../arm/mve/intrinsics/vabavq_p_s16.c         |   40 +-
 .../arm/mve/intrinsics/vabavq_p_s32.c         |   40 +-
 .../arm/mve/intrinsics/vabavq_p_s8.c          |   40 +-
 .../arm/mve/intrinsics/vabavq_p_u16.c         |   40 +-
 .../arm/mve/intrinsics/vabavq_p_u32.c         |   40 +-
 .../arm/mve/intrinsics/vabavq_p_u8.c          |   40 +-
 .../arm/mve/intrinsics/vabavq_s16.c           |   28 +-
 .../arm/mve/intrinsics/vabavq_s32.c           |   28 +-
 .../gcc.target/arm/mve/intrinsics/vabavq_s8.c |   28 +-
 .../arm/mve/intrinsics/vabavq_u16.c           |   28 +-
 .../arm/mve/intrinsics/vabavq_u32.c           |   28 +-
 .../gcc.target/arm/mve/intrinsics/vabavq_u8.c |   28 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_f16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_f32.c |   16 +-
 .../arm/mve/intrinsics/vabdq_m_f16.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_f32.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_s16.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_s32.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_s8.c           |   26 +-
 .../arm/mve/intrinsics/vabdq_m_u16.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_u32.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_m_u8.c           |   26 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vabdq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vabdq_x_f16.c          |   25 +-
 .../arm/mve/intrinsics/vabdq_x_f32.c          |   25 +-
 .../arm/mve/intrinsics/vabdq_x_s16.c          |   26 +-
 .../arm/mve/intrinsics/vabdq_x_s32.c          |   25 +-
 .../arm/mve/intrinsics/vabdq_x_s8.c           |   25 +-
 .../arm/mve/intrinsics/vabdq_x_u16.c          |   25 +-
 .../arm/mve/intrinsics/vabdq_x_u32.c          |   25 +-
 .../arm/mve/intrinsics/vabdq_x_u8.c           |   25 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_f16.c |   22 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_f32.c |   22 +-
 .../arm/mve/intrinsics/vabsq_m_f16.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_m_f32.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_m_s16.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_m_s32.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_m_s8.c           |   25 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_s16.c |   20 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_s32.c |   20 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_s8.c  |   16 +-
 .../arm/mve/intrinsics/vabsq_x_f16.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_x_f32.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_x_s16.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_x_s32.c          |   25 +-
 .../arm/mve/intrinsics/vabsq_x_s8.c           |   25 +-
 .../arm/mve/intrinsics/vaddlvaq_p_s32.c       |   24 +-
 .../arm/mve/intrinsics/vaddlvaq_p_u32.c       |   40 +-
 .../arm/mve/intrinsics/vaddlvaq_s32.c         |   16 +-
 .../arm/mve/intrinsics/vaddlvaq_u32.c         |   28 +-
 .../arm/mve/intrinsics/vaddlvq_p_s32.c        |   24 +-
 .../arm/mve/intrinsics/vaddlvq_p_u32.c        |   24 +-
 .../arm/mve/intrinsics/vaddlvq_s32.c          |   22 +-
 .../arm/mve/intrinsics/vaddlvq_u32.c          |   20 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_f16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_f32.c |   16 +-
 .../arm/mve/intrinsics/vaddq_m_f16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_f32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_n_f16.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_m_n_f32.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_m_n_s16.c        |   26 +-
 .../arm/mve/intrinsics/vaddq_m_n_s32.c        |   26 +-
 .../arm/mve/intrinsics/vaddq_m_n_s8.c         |   26 +-
 .../arm/mve/intrinsics/vaddq_m_n_u16.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_m_n_u32.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_m_n_u8.c         |   42 +-
 .../arm/mve/intrinsics/vaddq_m_s16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_s32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_s8.c           |   26 +-
 .../arm/mve/intrinsics/vaddq_m_u16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_u32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_m_u8.c           |   26 +-
 .../arm/mve/intrinsics/vaddq_n_f16.c          |   28 +-
 .../arm/mve/intrinsics/vaddq_n_f32.c          |   28 +-
 .../arm/mve/intrinsics/vaddq_n_s16.c          |   16 +-
 .../arm/mve/intrinsics/vaddq_n_s32.c          |   16 +-
 .../arm/mve/intrinsics/vaddq_n_s8.c           |   16 +-
 .../arm/mve/intrinsics/vaddq_n_u16.c          |   28 +-
 .../arm/mve/intrinsics/vaddq_n_u32.c          |   28 +-
 .../arm/mve/intrinsics/vaddq_n_u8.c           |   28 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vaddq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vaddq_x_f16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_f32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_n_f16.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_x_n_f32.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_x_n_s16.c        |   26 +-
 .../arm/mve/intrinsics/vaddq_x_n_s32.c        |   26 +-
 .../arm/mve/intrinsics/vaddq_x_n_s8.c         |   26 +-
 .../arm/mve/intrinsics/vaddq_x_n_u16.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_x_n_u32.c        |   42 +-
 .../arm/mve/intrinsics/vaddq_x_n_u8.c         |   42 +-
 .../arm/mve/intrinsics/vaddq_x_s16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_s32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_s8.c           |   26 +-
 .../arm/mve/intrinsics/vaddq_x_u16.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_u32.c          |   26 +-
 .../arm/mve/intrinsics/vaddq_x_u8.c           |   26 +-
 .../arm/mve/intrinsics/vaddvaq_p_s16.c        |   24 +-
 .../arm/mve/intrinsics/vaddvaq_p_s32.c        |   24 +-
 .../arm/mve/intrinsics/vaddvaq_p_s8.c         |   24 +-
 .../arm/mve/intrinsics/vaddvaq_p_u16.c        |   40 +-
 .../arm/mve/intrinsics/vaddvaq_p_u32.c        |   40 +-
 .../arm/mve/intrinsics/vaddvaq_p_u8.c         |   40 +-
 .../arm/mve/intrinsics/vaddvaq_s16.c          |   16 +-
 .../arm/mve/intrinsics/vaddvaq_s32.c          |   16 +-
 .../arm/mve/intrinsics/vaddvaq_s8.c           |   16 +-
 .../arm/mve/intrinsics/vaddvaq_u16.c          |   28 +-
 .../arm/mve/intrinsics/vaddvaq_u32.c          |   28 +-
 .../arm/mve/intrinsics/vaddvaq_u8.c           |   28 +-
 .../arm/mve/intrinsics/vaddvq_p_s16.c         |   24 +-
 .../arm/mve/intrinsics/vaddvq_p_s32.c         |   24 +-
 .../arm/mve/intrinsics/vaddvq_p_s8.c          |   24 +-
 .../arm/mve/intrinsics/vaddvq_p_u16.c         |   24 +-
 .../arm/mve/intrinsics/vaddvq_p_u32.c         |   24 +-
 .../arm/mve/intrinsics/vaddvq_p_u8.c          |   24 +-
 .../arm/mve/intrinsics/vaddvq_s16.c           |   22 +-
 .../arm/mve/intrinsics/vaddvq_s32.c           |   22 +-
 .../gcc.target/arm/mve/intrinsics/vaddvq_s8.c |   20 +-
 .../arm/mve/intrinsics/vaddvq_u16.c           |   20 +-
 .../arm/mve/intrinsics/vaddvq_u32.c           |   20 +-
 .../gcc.target/arm/mve/intrinsics/vaddvq_u8.c |   20 +-
 .../arm/mve/intrinsics/vcmpcsq_m_n_u16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpcsq_m_n_u32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpcsq_m_n_u8.c       |   47 +-
 .../arm/mve/intrinsics/vcmpcsq_m_u16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpcsq_m_u32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpcsq_m_u8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpcsq_n_u16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpcsq_n_u32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpcsq_n_u8.c         |   34 +-
 .../arm/mve/intrinsics/vcmpcsq_u16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpcsq_u32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpcsq_u8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_u16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_u32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpeqq_m_n_u8.c       |   47 +-
 .../arm/mve/intrinsics/vcmpeqq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_u16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_u32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_m_u8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpeqq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpeqq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpeqq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_n_u16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpeqq_n_u32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpeqq_n_u8.c         |   34 +-
 .../arm/mve/intrinsics/vcmpeqq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_u16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_u32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpeqq_u8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpgeq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpgeq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpgeq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpgeq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpgeq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgeq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpgtq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpgtq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpgtq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpgtq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpgtq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpgtq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmphiq_m_n_u16.c      |   47 +-
 .../arm/mve/intrinsics/vcmphiq_m_n_u32.c      |   47 +-
 .../arm/mve/intrinsics/vcmphiq_m_n_u8.c       |   47 +-
 .../arm/mve/intrinsics/vcmphiq_m_u16.c        |   29 +-
 .../arm/mve/intrinsics/vcmphiq_m_u32.c        |   29 +-
 .../arm/mve/intrinsics/vcmphiq_m_u8.c         |   29 +-
 .../arm/mve/intrinsics/vcmphiq_n_u16.c        |   34 +-
 .../arm/mve/intrinsics/vcmphiq_n_u32.c        |   34 +-
 .../arm/mve/intrinsics/vcmphiq_n_u8.c         |   34 +-
 .../arm/mve/intrinsics/vcmphiq_u16.c          |   20 +-
 .../arm/mve/intrinsics/vcmphiq_u32.c          |   20 +-
 .../arm/mve/intrinsics/vcmphiq_u8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpleq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpleq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpleq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpleq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpleq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpleq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpleq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpleq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpleq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpleq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpleq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpleq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpleq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpleq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpltq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpltq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpltq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpltq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpltq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpltq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpltq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpltq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpltq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpltq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpltq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpltq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpltq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpltq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpneq_f16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_f32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_m_f16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_f32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_f16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_f32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_s16.c      |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_s32.c      |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_s8.c       |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_u16.c      |   47 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_u32.c      |   47 +-
 .../arm/mve/intrinsics/vcmpneq_m_n_u8.c       |   47 +-
 .../arm/mve/intrinsics/vcmpneq_m_s16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_s32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_s8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_u16.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_u32.c        |   29 +-
 .../arm/mve/intrinsics/vcmpneq_m_u8.c         |   29 +-
 .../arm/mve/intrinsics/vcmpneq_n_f16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpneq_n_f32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpneq_n_s16.c        |   20 +-
 .../arm/mve/intrinsics/vcmpneq_n_s32.c        |   20 +-
 .../arm/mve/intrinsics/vcmpneq_n_s8.c         |   20 +-
 .../arm/mve/intrinsics/vcmpneq_n_u16.c        |   34 +-
 .../arm/mve/intrinsics/vcmpneq_n_u32.c        |   34 +-
 .../arm/mve/intrinsics/vcmpneq_n_u8.c         |   34 +-
 .../arm/mve/intrinsics/vcmpneq_s16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_s32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_s8.c           |   20 +-
 .../arm/mve/intrinsics/vcmpneq_u16.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_u32.c          |   20 +-
 .../arm/mve/intrinsics/vcmpneq_u8.c           |   20 +-
 .../arm/mve/intrinsics/vcreateq_f16.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_f32.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_s16.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_s32.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_s64.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_s8.c          |   23 +-
 .../arm/mve/intrinsics/vcreateq_u16.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_u32.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_u64.c         |   23 +-
 .../arm/mve/intrinsics/vcreateq_u8.c          |   23 +-
 .../arm/mve/intrinsics/vddupq_m_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vddupq_m_n_u32.c       |   46 +-
 .../arm/mve/intrinsics/vddupq_m_n_u8.c        |   46 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u16.c      |   42 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u32.c      |   46 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u8.c       |   46 +-
 .../arm/mve/intrinsics/vddupq_n_u16.c         |   32 +-
 .../arm/mve/intrinsics/vddupq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vddupq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vddupq_wb_u16.c        |   32 +-
 .../arm/mve/intrinsics/vddupq_wb_u32.c        |   28 +-
 .../arm/mve/intrinsics/vddupq_wb_u8.c         |   28 +-
 .../arm/mve/intrinsics/vddupq_x_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vddupq_x_n_u32.c       |   46 +-
 .../arm/mve/intrinsics/vddupq_x_n_u8.c        |   46 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u16.c      |   52 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u32.c      |   52 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u8.c       |   52 +-
 .../arm/mve/intrinsics/vdupq_m_n_f16.c        |   41 +-
 .../arm/mve/intrinsics/vdupq_m_n_f32.c        |   41 +-
 .../arm/mve/intrinsics/vdupq_m_n_s16.c        |   25 +-
 .../arm/mve/intrinsics/vdupq_m_n_s32.c        |   25 +-
 .../arm/mve/intrinsics/vdupq_m_n_s8.c         |   25 +-
 .../arm/mve/intrinsics/vdupq_m_n_u16.c        |   41 +-
 .../arm/mve/intrinsics/vdupq_m_n_u32.c        |   41 +-
 .../arm/mve/intrinsics/vdupq_m_n_u8.c         |   41 +-
 .../arm/mve/intrinsics/vdupq_n_f16.c          |   21 +-
 .../arm/mve/intrinsics/vdupq_n_f32.c          |   21 +-
 .../arm/mve/intrinsics/vdupq_n_s16.c          |   13 +-
 .../arm/mve/intrinsics/vdupq_n_s32.c          |   13 +-
 .../arm/mve/intrinsics/vdupq_n_s8.c           |    9 +-
 .../arm/mve/intrinsics/vdupq_n_u16.c          |   23 +-
 .../arm/mve/intrinsics/vdupq_n_u32.c          |   23 +-
 .../arm/mve/intrinsics/vdupq_n_u8.c           |   23 +-
 .../arm/mve/intrinsics/vdupq_x_n_f16.c        |   30 +-
 .../arm/mve/intrinsics/vdupq_x_n_f32.c        |   30 +-
 .../arm/mve/intrinsics/vdupq_x_n_s16.c        |   14 +-
 .../arm/mve/intrinsics/vdupq_x_n_s32.c        |   14 +-
 .../arm/mve/intrinsics/vdupq_x_n_s8.c         |   14 +-
 .../arm/mve/intrinsics/vdupq_x_n_u16.c        |   30 +-
 .../arm/mve/intrinsics/vdupq_x_n_u32.c        |   30 +-
 .../arm/mve/intrinsics/vdupq_x_n_u8.c         |   30 +-
 .../arm/mve/intrinsics/vdwdupq_m_n_u16.c      |   44 +-
 .../arm/mve/intrinsics/vdwdupq_m_n_u32.c      |   46 +-
 .../arm/mve/intrinsics/vdwdupq_m_n_u8.c       |   46 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u16.c     |   50 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u32.c     |   48 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u8.c      |   50 +-
 .../arm/mve/intrinsics/vdwdupq_n_u16.c        |   32 +-
 .../arm/mve/intrinsics/vdwdupq_n_u32.c        |   32 +-
 .../arm/mve/intrinsics/vdwdupq_n_u8.c         |   32 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u16.c       |   32 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u32.c       |   32 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u8.c        |   32 +-
 .../arm/mve/intrinsics/vdwdupq_x_n_u16.c      |   42 +-
 .../arm/mve/intrinsics/vdwdupq_x_n_u32.c      |   46 +-
 .../arm/mve/intrinsics/vdwdupq_x_n_u8.c       |   46 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u16.c     |   50 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u32.c     |   46 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u8.c      |   50 +-
 .../arm/mve/intrinsics/vfmasq_m_n_f16.c       |   50 +-
 .../arm/mve/intrinsics/vfmasq_m_n_f32.c       |   50 +-
 .../arm/mve/intrinsics/vhaddq_m_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vhaddq_m_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vhaddq_m_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vhaddq_m_s16.c         |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_s32.c         |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_s8.c          |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_u16.c         |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_u32.c         |   26 +-
 .../arm/mve/intrinsics/vhaddq_m_u8.c          |   26 +-
 .../arm/mve/intrinsics/vhaddq_n_s16.c         |   16 +-
 .../arm/mve/intrinsics/vhaddq_n_s32.c         |   16 +-
 .../arm/mve/intrinsics/vhaddq_n_s8.c          |   16 +-
 .../arm/mve/intrinsics/vhaddq_n_u16.c         |   28 +-
 .../arm/mve/intrinsics/vhaddq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vhaddq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vhaddq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vhaddq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c |   16 +-
 .../arm/mve/intrinsics/vhaddq_u16.c           |   16 +-
 .../arm/mve/intrinsics/vhaddq_u32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c |   16 +-
 .../arm/mve/intrinsics/vhaddq_x_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vhaddq_x_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vhaddq_x_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vhaddq_x_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vhaddq_x_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vhaddq_x_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vhaddq_x_s16.c         |   25 +-
 .../arm/mve/intrinsics/vhaddq_x_s32.c         |   25 +-
 .../arm/mve/intrinsics/vhaddq_x_s8.c          |   25 +-
 .../arm/mve/intrinsics/vhaddq_x_u16.c         |   25 +-
 .../arm/mve/intrinsics/vhaddq_x_u32.c         |   25 +-
 .../arm/mve/intrinsics/vhaddq_x_u8.c          |   25 +-
 .../arm/mve/intrinsics/vhsubq_m_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vhsubq_m_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vhsubq_m_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vhsubq_m_s16.c         |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_s32.c         |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_s8.c          |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_u16.c         |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_u32.c         |   26 +-
 .../arm/mve/intrinsics/vhsubq_m_u8.c          |   26 +-
 .../arm/mve/intrinsics/vhsubq_n_s16.c         |   16 +-
 .../arm/mve/intrinsics/vhsubq_n_s32.c         |   16 +-
 .../arm/mve/intrinsics/vhsubq_n_s8.c          |   16 +-
 .../arm/mve/intrinsics/vhsubq_n_u16.c         |   28 +-
 .../arm/mve/intrinsics/vhsubq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vhsubq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vhsubq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vhsubq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c |   16 +-
 .../arm/mve/intrinsics/vhsubq_u16.c           |   16 +-
 .../arm/mve/intrinsics/vhsubq_u32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c |   16 +-
 .../arm/mve/intrinsics/vhsubq_x_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vhsubq_x_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vhsubq_x_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vhsubq_x_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vhsubq_x_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vhsubq_x_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vhsubq_x_s16.c         |   25 +-
 .../arm/mve/intrinsics/vhsubq_x_s32.c         |   25 +-
 .../arm/mve/intrinsics/vhsubq_x_s8.c          |   25 +-
 .../arm/mve/intrinsics/vhsubq_x_u16.c         |   25 +-
 .../arm/mve/intrinsics/vhsubq_x_u32.c         |   25 +-
 .../arm/mve/intrinsics/vhsubq_x_u8.c          |   25 +-
 .../arm/mve/intrinsics/vidupq_m_n_u16.c       |   46 +-
 .../arm/mve/intrinsics/vidupq_m_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vidupq_m_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u16.c      |   46 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u32.c      |   42 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u8.c       |   42 +-
 .../arm/mve/intrinsics/vidupq_n_u16.c         |   32 +-
 .../arm/mve/intrinsics/vidupq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vidupq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vidupq_wb_u16.c        |   32 +-
 .../arm/mve/intrinsics/vidupq_wb_u32.c        |   28 +-
 .../arm/mve/intrinsics/vidupq_wb_u8.c         |   28 +-
 .../arm/mve/intrinsics/vidupq_x_n_u16.c       |   46 +-
 .../arm/mve/intrinsics/vidupq_x_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vidupq_x_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u16.c      |   52 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u32.c      |   52 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u8.c       |   52 +-
 .../arm/mve/intrinsics/viwdupq_m_n_u16.c      |   46 +-
 .../arm/mve/intrinsics/viwdupq_m_n_u32.c      |   46 +-
 .../arm/mve/intrinsics/viwdupq_m_n_u8.c       |   46 +-
 .../arm/mve/intrinsics/viwdupq_m_wb_u16.c     |   46 +-
 .../arm/mve/intrinsics/viwdupq_m_wb_u32.c     |   46 +-
 .../arm/mve/intrinsics/viwdupq_m_wb_u8.c      |   46 +-
 .../arm/mve/intrinsics/viwdupq_n_u16.c        |   32 +-
 .../arm/mve/intrinsics/viwdupq_n_u32.c        |   32 +-
 .../arm/mve/intrinsics/viwdupq_n_u8.c         |   28 +-
 .../arm/mve/intrinsics/viwdupq_wb_u16.c       |   36 +-
 .../arm/mve/intrinsics/viwdupq_wb_u32.c       |   36 +-
 .../arm/mve/intrinsics/viwdupq_wb_u8.c        |   36 +-
 .../arm/mve/intrinsics/viwdupq_x_n_u16.c      |   46 +-
 .../arm/mve/intrinsics/viwdupq_x_n_u32.c      |   46 +-
 .../arm/mve/intrinsics/viwdupq_x_n_u8.c       |   46 +-
 .../arm/mve/intrinsics/viwdupq_x_wb_u16.c     |   50 +-
 .../arm/mve/intrinsics/viwdupq_x_wb_u32.c     |   50 +-
 .../arm/mve/intrinsics/viwdupq_x_wb_u8.c      |   50 +-
 .../intrinsics/vldrwq_gather_base_wb_z_f32.c  |    2 +-
 .../intrinsics/vldrwq_gather_base_wb_z_s32.c  |    2 +-
 .../intrinsics/vldrwq_gather_base_wb_z_u32.c  |    2 +-
 .../arm/mve/intrinsics/vmaxaq_m_s16.c         |   25 +-
 .../arm/mve/intrinsics/vmaxaq_m_s32.c         |   25 +-
 .../arm/mve/intrinsics/vmaxaq_m_s8.c          |   25 +-
 .../arm/mve/intrinsics/vmaxaq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vmaxaq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxaq_s8.c |   16 +-
 .../arm/mve/intrinsics/vmaxavq_p_s16.c        |   41 +-
 .../arm/mve/intrinsics/vmaxavq_p_s32.c        |   41 +-
 .../arm/mve/intrinsics/vmaxavq_p_s8.c         |   41 +-
 .../arm/mve/intrinsics/vmaxavq_s16.c          |   29 +-
 .../arm/mve/intrinsics/vmaxavq_s32.c          |   29 +-
 .../arm/mve/intrinsics/vmaxavq_s8.c           |   29 +-
 .../arm/mve/intrinsics/vmaxnmaq_f16.c         |   16 +-
 .../arm/mve/intrinsics/vmaxnmaq_f32.c         |   16 +-
 .../arm/mve/intrinsics/vmaxnmaq_m_f16.c       |   25 +-
 .../arm/mve/intrinsics/vmaxnmaq_m_f32.c       |   25 +-
 .../arm/mve/intrinsics/vmaxnmavq_f16.c        |   27 +-
 .../arm/mve/intrinsics/vmaxnmavq_f32.c        |   27 +-
 .../arm/mve/intrinsics/vmaxnmavq_p_f16.c      |   39 +-
 .../arm/mve/intrinsics/vmaxnmavq_p_f32.c      |   39 +-
 .../arm/mve/intrinsics/vmaxnmq_f16.c          |   16 +-
 .../arm/mve/intrinsics/vmaxnmq_f32.c          |   16 +-
 .../arm/mve/intrinsics/vmaxnmq_m_f16.c        |   26 +-
 .../arm/mve/intrinsics/vmaxnmq_m_f32.c        |   26 +-
 .../arm/mve/intrinsics/vmaxnmq_x_f16.c        |   25 +-
 .../arm/mve/intrinsics/vmaxnmq_x_f32.c        |   25 +-
 .../arm/mve/intrinsics/vmaxnmvq_f16.c         |   27 +-
 .../arm/mve/intrinsics/vmaxnmvq_f32.c         |   27 +-
 .../arm/mve/intrinsics/vmaxnmvq_p_f16.c       |   39 +-
 .../arm/mve/intrinsics/vmaxnmvq_p_f32.c       |   39 +-
 .../arm/mve/intrinsics/vmaxq_m_s16.c          |   26 +-
 .../arm/mve/intrinsics/vmaxq_m_s32.c          |   26 +-
 .../arm/mve/intrinsics/vmaxq_m_s8.c           |   26 +-
 .../arm/mve/intrinsics/vmaxq_m_u16.c          |   26 +-
 .../arm/mve/intrinsics/vmaxq_m_u32.c          |   26 +-
 .../arm/mve/intrinsics/vmaxq_m_u8.c           |   26 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vmaxq_x_s16.c          |   25 +-
 .../arm/mve/intrinsics/vmaxq_x_s32.c          |   25 +-
 .../arm/mve/intrinsics/vmaxq_x_s8.c           |   25 +-
 .../arm/mve/intrinsics/vmaxq_x_u16.c          |   25 +-
 .../arm/mve/intrinsics/vmaxq_x_u32.c          |   25 +-
 .../arm/mve/intrinsics/vmaxq_x_u8.c           |   25 +-
 .../arm/mve/intrinsics/vmaxvq_p_s16.c         |   31 +-
 .../arm/mve/intrinsics/vmaxvq_p_s32.c         |   31 +-
 .../arm/mve/intrinsics/vmaxvq_p_s8.c          |   31 +-
 .../arm/mve/intrinsics/vmaxvq_p_u16.c         |   39 +-
 .../arm/mve/intrinsics/vmaxvq_p_u32.c         |   39 +-
 .../arm/mve/intrinsics/vmaxvq_p_u8.c          |   39 +-
 .../arm/mve/intrinsics/vmaxvq_s16.c           |   23 +-
 .../arm/mve/intrinsics/vmaxvq_s32.c           |   23 +-
 .../gcc.target/arm/mve/intrinsics/vmaxvq_s8.c |   23 +-
 .../arm/mve/intrinsics/vmaxvq_u16.c           |   27 +-
 .../arm/mve/intrinsics/vmaxvq_u32.c           |   27 +-
 .../gcc.target/arm/mve/intrinsics/vmaxvq_u8.c |   27 +-
 .../arm/mve/intrinsics/vminaq_m_s16.c         |   25 +-
 .../arm/mve/intrinsics/vminaq_m_s32.c         |   25 +-
 .../arm/mve/intrinsics/vminaq_m_s8.c          |   25 +-
 .../arm/mve/intrinsics/vminaq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vminaq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminaq_s8.c |   16 +-
 .../arm/mve/intrinsics/vminavq_p_s16.c        |   41 +-
 .../arm/mve/intrinsics/vminavq_p_s32.c        |   41 +-
 .../arm/mve/intrinsics/vminavq_p_s8.c         |   41 +-
 .../arm/mve/intrinsics/vminavq_s16.c          |   29 +-
 .../arm/mve/intrinsics/vminavq_s32.c          |   29 +-
 .../arm/mve/intrinsics/vminavq_s8.c           |   29 +-
 .../arm/mve/intrinsics/vminnmaq_f16.c         |   16 +-
 .../arm/mve/intrinsics/vminnmaq_f32.c         |   16 +-
 .../arm/mve/intrinsics/vminnmaq_m_f16.c       |   25 +-
 .../arm/mve/intrinsics/vminnmaq_m_f32.c       |   25 +-
 .../arm/mve/intrinsics/vminnmavq_f16.c        |   27 +-
 .../arm/mve/intrinsics/vminnmavq_f32.c        |   27 +-
 .../arm/mve/intrinsics/vminnmavq_p_f16.c      |   39 +-
 .../arm/mve/intrinsics/vminnmavq_p_f32.c      |   39 +-
 .../arm/mve/intrinsics/vminnmq_f16.c          |   16 +-
 .../arm/mve/intrinsics/vminnmq_f32.c          |   16 +-
 .../arm/mve/intrinsics/vminnmq_m_f16.c        |   26 +-
 .../arm/mve/intrinsics/vminnmq_m_f32.c        |   26 +-
 .../arm/mve/intrinsics/vminnmq_x_f16.c        |   25 +-
 .../arm/mve/intrinsics/vminnmq_x_f32.c        |   25 +-
 .../arm/mve/intrinsics/vminnmvq_f16.c         |   27 +-
 .../arm/mve/intrinsics/vminnmvq_f32.c         |   27 +-
 .../arm/mve/intrinsics/vminnmvq_p_f16.c       |   39 +-
 .../arm/mve/intrinsics/vminnmvq_p_f32.c       |   39 +-
 .../arm/mve/intrinsics/vminq_m_s16.c          |   26 +-
 .../arm/mve/intrinsics/vminq_m_s32.c          |   26 +-
 .../arm/mve/intrinsics/vminq_m_s8.c           |   26 +-
 .../arm/mve/intrinsics/vminq_m_u16.c          |   26 +-
 .../arm/mve/intrinsics/vminq_m_u32.c          |   26 +-
 .../arm/mve/intrinsics/vminq_m_u8.c           |   26 +-
 .../gcc.target/arm/mve/intrinsics/vminq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vminq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vminq_x_s16.c          |   25 +-
 .../arm/mve/intrinsics/vminq_x_s32.c          |   25 +-
 .../arm/mve/intrinsics/vminq_x_s8.c           |   25 +-
 .../arm/mve/intrinsics/vminq_x_u16.c          |   25 +-
 .../arm/mve/intrinsics/vminq_x_u32.c          |   25 +-
 .../arm/mve/intrinsics/vminq_x_u8.c           |   25 +-
 .../arm/mve/intrinsics/vminvq_p_s16.c         |   31 +-
 .../arm/mve/intrinsics/vminvq_p_s32.c         |   31 +-
 .../arm/mve/intrinsics/vminvq_p_s8.c          |   31 +-
 .../arm/mve/intrinsics/vminvq_p_u16.c         |   39 +-
 .../arm/mve/intrinsics/vminvq_p_u32.c         |   39 +-
 .../arm/mve/intrinsics/vminvq_p_u8.c          |   39 +-
 .../arm/mve/intrinsics/vminvq_s16.c           |   22 +-
 .../arm/mve/intrinsics/vminvq_s32.c           |   22 +-
 .../gcc.target/arm/mve/intrinsics/vminvq_s8.c |   22 +-
 .../arm/mve/intrinsics/vminvq_u16.c           |   29 +-
 .../arm/mve/intrinsics/vminvq_u32.c           |   26 +-
 .../gcc.target/arm/mve/intrinsics/vminvq_u8.c |   29 +-
 .../arm/mve/intrinsics/vmladavaq_p_s16.c      |   33 +-
 .../arm/mve/intrinsics/vmladavaq_p_s32.c      |   33 +-
 .../arm/mve/intrinsics/vmladavaq_p_s8.c       |   33 +-
 .../arm/mve/intrinsics/vmladavaq_p_u16.c      |   49 +-
 .../arm/mve/intrinsics/vmladavaq_p_u32.c      |   49 +-
 .../arm/mve/intrinsics/vmladavaq_p_u8.c       |   49 +-
 .../arm/mve/intrinsics/vmladavaxq_p_s16.c     |   33 +-
 .../arm/mve/intrinsics/vmladavaxq_p_s32.c     |   33 +-
 .../arm/mve/intrinsics/vmladavaxq_p_s8.c      |   33 +-
 .../arm/mve/intrinsics/vmladavaxq_s16.c       |   24 +-
 .../arm/mve/intrinsics/vmladavaxq_s32.c       |   24 +-
 .../arm/mve/intrinsics/vmladavaxq_s8.c        |   24 +-
 .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c    |   32 +-
 .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c    |   32 +-
 .../arm/mve/intrinsics/vmlaldavaxq_s16.c      |   24 +-
 .../arm/mve/intrinsics/vmlaldavaxq_s32.c      |   24 +-
 .../arm/mve/intrinsics/vmlasq_m_n_s16.c       |   34 +-
 .../arm/mve/intrinsics/vmlasq_m_n_s32.c       |   34 +-
 .../arm/mve/intrinsics/vmlasq_m_n_s8.c        |   34 +-
 .../arm/mve/intrinsics/vmlasq_m_n_u16.c       |   50 +-
 .../arm/mve/intrinsics/vmlasq_m_n_u32.c       |   50 +-
 .../arm/mve/intrinsics/vmlasq_m_n_u8.c        |   50 +-
 .../arm/mve/intrinsics/vmlasq_n_s16.c         |   24 +-
 .../arm/mve/intrinsics/vmlasq_n_s32.c         |   24 +-
 .../arm/mve/intrinsics/vmlasq_n_s8.c          |   24 +-
 .../arm/mve/intrinsics/vmlasq_n_u16.c         |   36 +-
 .../arm/mve/intrinsics/vmlasq_n_u32.c         |   36 +-
 .../arm/mve/intrinsics/vmlasq_n_u8.c          |   36 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_f16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_f32.c |   16 +-
 .../arm/mve/intrinsics/vmulq_m_f16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_f32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_n_f16.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_m_n_f32.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_m_n_s16.c        |   26 +-
 .../arm/mve/intrinsics/vmulq_m_n_s32.c        |   26 +-
 .../arm/mve/intrinsics/vmulq_m_n_s8.c         |   26 +-
 .../arm/mve/intrinsics/vmulq_m_n_u16.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_m_n_u32.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_m_n_u8.c         |   42 +-
 .../arm/mve/intrinsics/vmulq_m_s16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_s32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_s8.c           |   26 +-
 .../arm/mve/intrinsics/vmulq_m_u16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_u32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_m_u8.c           |   26 +-
 .../arm/mve/intrinsics/vmulq_n_f16.c          |   28 +-
 .../arm/mve/intrinsics/vmulq_n_f32.c          |   28 +-
 .../arm/mve/intrinsics/vmulq_n_s16.c          |   16 +-
 .../arm/mve/intrinsics/vmulq_n_s32.c          |   16 +-
 .../arm/mve/intrinsics/vmulq_n_s8.c           |   16 +-
 .../arm/mve/intrinsics/vmulq_n_u16.c          |   28 +-
 .../arm/mve/intrinsics/vmulq_n_u32.c          |   28 +-
 .../arm/mve/intrinsics/vmulq_n_u8.c           |   28 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vmulq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vmulq_x_f16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_f32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_n_f16.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_x_n_f32.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_x_n_s16.c        |   26 +-
 .../arm/mve/intrinsics/vmulq_x_n_s32.c        |   26 +-
 .../arm/mve/intrinsics/vmulq_x_n_s8.c         |   26 +-
 .../arm/mve/intrinsics/vmulq_x_n_u16.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_x_n_u32.c        |   42 +-
 .../arm/mve/intrinsics/vmulq_x_n_u8.c         |   42 +-
 .../arm/mve/intrinsics/vmulq_x_s16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_s32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_s8.c           |   26 +-
 .../arm/mve/intrinsics/vmulq_x_u16.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_u32.c          |   26 +-
 .../arm/mve/intrinsics/vmulq_x_u8.c           |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vqaddq_m_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vqaddq_m_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vqaddq_m_s16.c         |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_s32.c         |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_s8.c          |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_u16.c         |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_u32.c         |   26 +-
 .../arm/mve/intrinsics/vqaddq_m_u8.c          |   26 +-
 .../arm/mve/intrinsics/vqaddq_n_s16.c         |   16 +-
 .../arm/mve/intrinsics/vqaddq_n_s32.c         |   16 +-
 .../arm/mve/intrinsics/vqaddq_n_s8.c          |   16 +-
 .../arm/mve/intrinsics/vqaddq_n_u16.c         |   28 +-
 .../arm/mve/intrinsics/vqaddq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vqaddq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vqaddq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vqaddq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vqaddq_s8.c |   16 +-
 .../arm/mve/intrinsics/vqaddq_u16.c           |   16 +-
 .../arm/mve/intrinsics/vqaddq_u32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vqaddq_u8.c |   16 +-
 .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c     |   34 +-
 .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c     |   34 +-
 .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c      |   34 +-
 .../arm/mve/intrinsics/vqdmlahq_n_s16.c       |   24 +-
 .../arm/mve/intrinsics/vqdmlahq_n_s32.c       |   24 +-
 .../arm/mve/intrinsics/vqdmlahq_n_s8.c        |   24 +-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c    |   34 +-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c    |   34 +-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c     |   34 +-
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c      |   24 +-
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c      |   24 +-
 .../arm/mve/intrinsics/vqdmlashq_n_s8.c       |   24 +-
 .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c     |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c     |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c      |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_m_s16.c       |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_m_s32.c       |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_m_s8.c        |   26 +-
 .../arm/mve/intrinsics/vqdmulhq_n_s16.c       |   16 +-
 .../arm/mve/intrinsics/vqdmulhq_n_s32.c       |   16 +-
 .../arm/mve/intrinsics/vqdmulhq_n_s8.c        |   16 +-
 .../arm/mve/intrinsics/vqdmulhq_s16.c         |   16 +-
 .../arm/mve/intrinsics/vqdmulhq_s32.c         |   16 +-
 .../arm/mve/intrinsics/vqdmulhq_s8.c          |   16 +-
 .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c    |   26 +-
 .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c    |   26 +-
 .../arm/mve/intrinsics/vqdmullbq_m_s16.c      |   26 +-
 .../arm/mve/intrinsics/vqdmullbq_m_s32.c      |   26 +-
 .../arm/mve/intrinsics/vqdmullbq_n_s16.c      |   16 +-
 .../arm/mve/intrinsics/vqdmullbq_n_s32.c      |   16 +-
 .../arm/mve/intrinsics/vqdmullbq_s16.c        |   16 +-
 .../arm/mve/intrinsics/vqdmullbq_s32.c        |   16 +-
 .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c    |   26 +-
 .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c    |   26 +-
 .../arm/mve/intrinsics/vqdmulltq_m_s16.c      |   26 +-
 .../arm/mve/intrinsics/vqdmulltq_m_s32.c      |   26 +-
 .../arm/mve/intrinsics/vqdmulltq_n_s16.c      |   16 +-
 .../arm/mve/intrinsics/vqdmulltq_n_s32.c      |   16 +-
 .../arm/mve/intrinsics/vqdmulltq_s16.c        |   16 +-
 .../arm/mve/intrinsics/vqdmulltq_s32.c        |   16 +-
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s16.c    |   34 +-
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s32.c    |   34 +-
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s8.c     |   34 +-
 .../arm/mve/intrinsics/vqrdmlahq_n_s16.c      |   24 +-
 .../arm/mve/intrinsics/vqrdmlahq_n_s32.c      |   24 +-
 .../arm/mve/intrinsics/vqrdmlahq_n_s8.c       |   24 +-
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s16.c   |   34 +-
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s32.c   |   34 +-
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s8.c    |   34 +-
 .../arm/mve/intrinsics/vqsubq_m_n_s16.c       |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_n_s32.c       |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_n_s8.c        |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_n_u16.c       |   42 +-
 .../arm/mve/intrinsics/vqsubq_m_n_u32.c       |   42 +-
 .../arm/mve/intrinsics/vqsubq_m_n_u8.c        |   42 +-
 .../arm/mve/intrinsics/vqsubq_m_s16.c         |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_s32.c         |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_s8.c          |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_u16.c         |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_u32.c         |   26 +-
 .../arm/mve/intrinsics/vqsubq_m_u8.c          |   26 +-
 .../arm/mve/intrinsics/vqsubq_n_s16.c         |   16 +-
 .../arm/mve/intrinsics/vqsubq_n_s32.c         |   16 +-
 .../arm/mve/intrinsics/vqsubq_n_s8.c          |   16 +-
 .../arm/mve/intrinsics/vqsubq_n_u16.c         |   28 +-
 .../arm/mve/intrinsics/vqsubq_n_u32.c         |   28 +-
 .../arm/mve/intrinsics/vqsubq_n_u8.c          |   28 +-
 .../arm/mve/intrinsics/vqsubq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vqsubq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c |   16 +-
 .../arm/mve/intrinsics/vqsubq_u16.c           |   16 +-
 .../arm/mve/intrinsics/vqsubq_u32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c |   16 +-
 .../arm/mve/intrinsics/vrmlaldavhaq_p_s32.c   |   24 +-
 .../arm/mve/intrinsics/vrmlaldavhaq_p_u32.c   |   40 +-
 .../arm/mve/intrinsics/vrshlq_m_n_s16.c       |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_n_s32.c       |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_n_s8.c        |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_n_u16.c       |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_n_u32.c       |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_n_u8.c        |   25 +-
 .../arm/mve/intrinsics/vrshlq_m_s16.c         |   26 +-
 .../arm/mve/intrinsics/vrshlq_m_s32.c         |   26 +-
 .../arm/mve/intrinsics/vrshlq_m_s8.c          |   26 +-
 .../arm/mve/intrinsics/vrshlq_m_u16.c         |   26 +-
 .../arm/mve/intrinsics/vrshlq_m_u32.c         |   26 +-
 .../arm/mve/intrinsics/vrshlq_m_u8.c          |   26 +-
 .../arm/mve/intrinsics/vrshlq_n_s16.c         |   16 +-
 .../arm/mve/intrinsics/vrshlq_n_s32.c         |   16 +-
 .../arm/mve/intrinsics/vrshlq_n_s8.c          |   16 +-
 .../arm/mve/intrinsics/vrshlq_n_u16.c         |   16 +-
 .../arm/mve/intrinsics/vrshlq_n_u32.c         |   16 +-
 .../arm/mve/intrinsics/vrshlq_n_u8.c          |   16 +-
 .../arm/mve/intrinsics/vrshlq_s16.c           |   16 +-
 .../arm/mve/intrinsics/vrshlq_s32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c |   16 +-
 .../arm/mve/intrinsics/vrshlq_u16.c           |   16 +-
 .../arm/mve/intrinsics/vrshlq_u32.c           |   16 +-
 .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c |   16 +-
 .../arm/mve/intrinsics/vrshlq_x_s16.c         |   25 +-
 .../arm/mve/intrinsics/vrshlq_x_s32.c         |   25 +-
 .../arm/mve/intrinsics/vrshlq_x_s8.c          |   25 +-
 .../arm/mve/intrinsics/vrshlq_x_u16.c         |   25 +-
 .../arm/mve/intrinsics/vrshlq_x_u32.c         |   25 +-
 .../arm/mve/intrinsics/vrshlq_x_u8.c          |   25 +-
 .../arm/mve/intrinsics/vsetq_lane_f16.c       |   36 +-
 .../arm/mve/intrinsics/vsetq_lane_f32.c       |   36 +-
 .../arm/mve/intrinsics/vsetq_lane_s16.c       |   24 +-
 .../arm/mve/intrinsics/vsetq_lane_s32.c       |   24 +-
 .../arm/mve/intrinsics/vsetq_lane_s64.c       |   27 +-
 .../arm/mve/intrinsics/vsetq_lane_s8.c        |   24 +-
 .../arm/mve/intrinsics/vsetq_lane_u16.c       |   36 +-
 .../arm/mve/intrinsics/vsetq_lane_u32.c       |   36 +-
 .../arm/mve/intrinsics/vsetq_lane_u64.c       |   39 +-
 .../arm/mve/intrinsics/vsetq_lane_u8.c        |   36 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_f16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_f32.c |   16 +-
 .../arm/mve/intrinsics/vsubq_m_f16.c          |   26 +-
 .../arm/mve/intrinsics/vsubq_m_f32.c          |   26 +-
 .../arm/mve/intrinsics/vsubq_m_n_f16.c        |   42 +-
 .../arm/mve/intrinsics/vsubq_m_n_f32.c        |   42 +-
 .../arm/mve/intrinsics/vsubq_m_n_s16.c        |   26 +-
 .../arm/mve/intrinsics/vsubq_m_n_s32.c        |   26 +-
 .../arm/mve/intrinsics/vsubq_m_n_s8.c         |   26 +-
 .../arm/mve/intrinsics/vsubq_m_n_u16.c        |   42 +-
 .../arm/mve/intrinsics/vsubq_m_n_u32.c        |   42 +-
 .../arm/mve/intrinsics/vsubq_m_n_u8.c         |   42 +-
 .../arm/mve/intrinsics/vsubq_m_s16.c          |   25 +-
 .../arm/mve/intrinsics/vsubq_m_s32.c          |   25 +-
 .../arm/mve/intrinsics/vsubq_m_s8.c           |   25 +-
 .../arm/mve/intrinsics/vsubq_m_u16.c          |   25 +-
 .../arm/mve/intrinsics/vsubq_m_u32.c          |   25 +-
 .../arm/mve/intrinsics/vsubq_m_u8.c           |   25 +-
 .../arm/mve/intrinsics/vsubq_n_f16.c          |   28 +-
 .../arm/mve/intrinsics/vsubq_n_f32.c          |   28 +-
 .../arm/mve/intrinsics/vsubq_n_s16.c          |   17 +-
 .../arm/mve/intrinsics/vsubq_n_s32.c          |   17 +-
 .../arm/mve/intrinsics/vsubq_n_s8.c           |   17 +-
 .../arm/mve/intrinsics/vsubq_n_u16.c          |   29 +-
 .../arm/mve/intrinsics/vsubq_n_u32.c          |   29 +-
 .../arm/mve/intrinsics/vsubq_n_u8.c           |   29 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_s16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_s32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_s8.c  |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_u16.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_u32.c |   16 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_u8.c  |   16 +-
 .../arm/mve/intrinsics/vsubq_x_f16.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_f32.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_n_f16.c        |   48 +-
 .../arm/mve/intrinsics/vsubq_x_n_f32.c        |   48 +-
 .../arm/mve/intrinsics/vsubq_x_n_s16.c        |   32 +-
 .../arm/mve/intrinsics/vsubq_x_n_s32.c        |   32 +-
 .../arm/mve/intrinsics/vsubq_x_n_s8.c         |   32 +-
 .../arm/mve/intrinsics/vsubq_x_n_u16.c        |   48 +-
 .../arm/mve/intrinsics/vsubq_x_n_u32.c        |   48 +-
 .../arm/mve/intrinsics/vsubq_x_n_u8.c         |   48 +-
 .../arm/mve/intrinsics/vsubq_x_s16.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_s32.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_s8.c           |   32 +-
 .../arm/mve/intrinsics/vsubq_x_u16.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_u32.c          |   32 +-
 .../arm/mve/intrinsics/vsubq_x_u8.c           |   32 +-
 868 files changed, 22007 insertions(+), 3613 deletions(-)

--
2.25.1

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 01/35] arm: improve vcreateq* tests
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18  9:47   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization Andrea Corallo
                   ` (34 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vcreateq_f16.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_f32.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_s16.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_s32.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_s64.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_s8.c          | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_u16.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_u32.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_u64.c         | 23 ++++++++++++++++++-
 .../arm/mve/intrinsics/vcreateq_u8.c          | 23 ++++++++++++++++++-
 10 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
index fb3601edb94..c39303daa03 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 float16x8_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+float16x8_t
+foo1 ()
+{
+  return vcreateq_f16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
index 4f4da62eed7..ad66f4407cd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 float32x4_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+float32x4_t
+foo1 ()
+{
+  return vcreateq_f32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
index 103be6310bd..7e70a486513 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 int16x8_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+int16x8_t
+foo1 ()
+{
+  return vcreateq_s16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
index 96f7a972d93..ffcfc80ff40 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 int32x4_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+int32x4_t
+foo1 ()
+{
+  return vcreateq_s32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
index 74c554506c0..26642f9cd68 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 int64x2_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_s64 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+int64x2_t
+foo1 ()
+{
+  return vcreateq_s64 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
index 03c50a0928a..7e7e4d5948d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 int8x16_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+int8x16_t
+foo1 ()
+{
+  return vcreateq_s8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
index 411cec8471e..858a3a4546f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 uint16x8_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+uint16x8_t
+foo1 ()
+{
+  return vcreateq_u16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
index 8bc8f60640e..5f27cf68845 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 uint32x4_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+uint32x4_t
+foo1 ()
+{
+  return vcreateq_u32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
index e74641c32f3..78553dec701 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 uint64x2_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_u64 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+uint64x2_t
+foo1 ()
+{
+  return vcreateq_u64 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
index de79f471d63..4a8ab61f865 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
 uint8x16_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+**	...
+**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+**	...
+*/
+uint8x16_t
+foo1 ()
+{
+  return vcreateq_u8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
  2022-11-17 16:37 ` [PATCH 01/35] arm: improve vcreateq* tests Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:33   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 03/35] arm: improve tests and fix vddupq* Andrea Corallo
                   ` (33 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/vfp.md (*thumb2_movhi_vfp, *thumb2_movhi_fp16): Fix
	'vmsr' spacing and reg capitalization.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
	Update test.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c:
	Likewise.
---
 gcc/config/arm/vfp.md                                     | 8 ++++----
 .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c      | 2 +-
 .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c      | 2 +-
 .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c      | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index d0f423cc3c5..932e4b7447e 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -105,9 +105,9 @@ (define_insn "*thumb2_movhi_vfp"
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
     case 9:
-      return "vmsr%?\t P0, %1\t@ movhi";
+      return "vmsr%?\tp0, %1\t@ movhi";
     case 10:
-      return "vmrs%?\t %0, P0\t@ movhi";
+      return "vmrs%?\t%0, p0\t@ movhi";
     default:
       gcc_unreachable ();
     }
@@ -209,9 +209,9 @@ (define_insn "*thumb2_movhi_fp16"
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
     case 9:
-      return "vmsr%?\t P0, %1\t%@ movhi";
+      return "vmsr%?\tp0, %1\t%@ movhi";
     case 10:
-      return "vmrs%?\t%0, P0\t%@ movhi";
+      return "vmrs%?\t%0, p0\t%@ movhi";
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
index f3219e2e825..1e57ca40739 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
@@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
 }
 
 /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
 /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
index 4d093d243fe..f8d77fdfd5b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
@@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
 }
 
 /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
 /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
index e796522a49c..8a0e109c70c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
@@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
 }
 
 /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
 /* { dg-final { scan-assembler "vpst" } } */
 /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
 /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 03/35] arm: improve tests and fix vddupq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
  2022-11-17 16:37 ` [PATCH 01/35] arm: improve vcreateq* tests Andrea Corallo
  2022-11-17 16:37 ` [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:34   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 04/35] arm: improve tests and fix vdwdupq* Andrea Corallo
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vddupq_u<mode>_insn): Fix 'vddup.u'
	spacing.
	(mve_vddupq_m_wb_u<mode>_insn): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c : Likewise.
---
 gcc/config/arm/mve.md                         |  4 +-
 .../arm/mve/intrinsics/vddupq_m_n_u16.c       | 42 +++++++++++++--
 .../arm/mve/intrinsics/vddupq_m_n_u32.c       | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_m_n_u8.c        | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_m_wb_u16.c      | 42 +++++++++++++--
 .../arm/mve/intrinsics/vddupq_m_wb_u32.c      | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_m_wb_u8.c       | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_n_u16.c         | 32 ++++++++++--
 .../arm/mve/intrinsics/vddupq_n_u32.c         | 28 +++++++++-
 .../arm/mve/intrinsics/vddupq_n_u8.c          | 28 +++++++++-
 .../arm/mve/intrinsics/vddupq_wb_u16.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/vddupq_wb_u32.c        | 28 +++++++++-
 .../arm/mve/intrinsics/vddupq_wb_u8.c         | 28 +++++++++-
 .../arm/mve/intrinsics/vddupq_x_n_u16.c       | 42 +++++++++++++--
 .../arm/mve/intrinsics/vddupq_x_n_u32.c       | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_x_n_u8.c        | 46 +++++++++++++---
 .../arm/mve/intrinsics/vddupq_x_wb_u16.c      | 52 +++++++++++++++----
 .../arm/mve/intrinsics/vddupq_x_wb_u32.c      | 52 +++++++++++++++----
 .../arm/mve/intrinsics/vddupq_x_wb_u8.c       | 52 +++++++++++++++----
 19 files changed, 642 insertions(+), 96 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 62186f124da..1215f845388 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9043,7 +9043,7 @@ (define_insn "mve_vddupq_u<mode>_insn"
        (minus:SI (match_dup 2)
 		 (match_operand:SI 4 "immediate_operand" "i")))]
  "TARGET_HAVE_MVE"
- "vddup.u%#<V_sz_elem>  %q0, %1, %3")
+ "vddup.u%#<V_sz_elem>\t%q0, %1, %3")
 
 ;;
 ;; [vddupq_m_n_u])
@@ -9079,7 +9079,7 @@ (define_insn "mve_vddupq_m_wb_u<mode>_insn"
        (minus:SI (match_dup 3)
 		 (match_operand:SI 6 "immediate_operand" "i")))]
  "TARGET_HAVE_MVE"
- "vpst\;\tvddupt.u%#<V_sz_elem>\t%q0, %2, %4"
+ "vpst\;vddupt.u%#<V_sz_elem>\t%q0, %2, %4"
  [(set_attr "length""8")])
 
 ;;
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
index 7332711f6a7..7c8b0152763 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vddupq_m_n_u16 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
index 54ad91f2803..810a1a7e21b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vddupq_m_n_u32 (inactive, a, 4, p);
+  return vddupq_m_n_u32 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vddupq_m (inactive, a, 4, p);
+  return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
index 3746b5db6e5..6642b9f4b88 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vddupq_m_n_u8 (inactive, a, 4, p);
+  return vddupq_m_n_u8 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vddupq_m (inactive, a, 4, p);
+  return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
index 8b5d9e86469..cc6a19516d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vddupq_m_wb_u16 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
index 7a8c363ac70..cd6c6f86eea 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_m_wb_u32 (inactive, a, 4, p);
+  return vddupq_m_wb_u32 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_m (inactive, a, 4, p);
+  return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
index 45784a5c9cd..fe186e743da 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_m_wb_u8 (inactive, a, 4, p);
+  return vddupq_m_wb_u8 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_m (inactive, a, 4, p);
+  return vddupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vddupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
index 4684e2af553..2dba2d74b61 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a)
 {
-  return vddupq_n_u16 (a, 4);
+  return vddupq_n_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a)
 {
-  return vddupq_u16 (a, 4);
+  return vddupq_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vddupq_u16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
index aeaa83eb6bc..6b5cf6c75b0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a)
 {
   return vddupq_n_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a)
 {
   return vddupq_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vddupq_u32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
index 255a9f80b6b..174e422f4ef 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a)
 {
   return vddupq_n_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a)
 {
   return vddupq_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vddupq_u8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
index 40fc6cf2197..6a471a7f72f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t *a)
 {
-  return vddupq_wb_u16 (a, 4);
+  return vddupq_wb_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t *a)
 {
-  return vddupq_u16 (a, 4);
+  return vddupq_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vddupq_u16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
index 09b5b1f2f80..debf420d3e8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t *a)
 {
   return vddupq_wb_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t *a)
 {
   return vddupq_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vddupq_u32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
index 00dfa906748..8e6ef8adccd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t *a)
 {
   return vddupq_wb_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t *a)
 {
   return vddupq_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vddup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vddupq_u8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
index 5b0fc0b6340..1aafaf87b82 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, mve_pred16_t p)
 {
   return vddupq_x_n_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
   return vddupq_x_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return vddupq_x_u16 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
index 66def991b65..2e3e268dbee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, mve_pred16_t p)
 {
-  return vddupq_x_n_u32 (a, 4, p);
+  return vddupq_x_n_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
-  return vddupq_x_u32 (a, 4, p);
+  return vddupq_x_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return vddupq_x_u32 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
index 8ac322ed52d..bdf563a8074 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, mve_pred16_t p)
 {
-  return vddupq_x_n_u8 (a, 4, p);
+  return vddupq_x_n_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
-  return vddupq_x_u8 (a, 4, p);
+  return vddupq_x_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return vddupq_x_u8 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
index 030048f840a..713d8b731c8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t *a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_x_wb_u16 (a, 2, p);
+  return vddupq_x_wb_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vddupq_x_u16 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vddupq_x_u16 (a, 2, p);
+  return vddupq_x_u16 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
index 95bf28e4052..9f484b3b8fb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t *a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_x_wb_u32 (a, 8, p);
+  return vddupq_x_wb_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vddupq_x_u32 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vddupq_x_u32 (a, 8, p);
+  return vddupq_x_u32 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
index 2fe81dded55..aa83bfed125 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t *a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vddupq_x_wb_u8 (a, 8, p);
+  return vddupq_x_wb_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vddupq_x_u8 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vddupq_x_u8 (a, 8, p);
+  return vddupq_x_u8 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vddupt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 04/35] arm: improve tests and fix vdwdupq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (2 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 03/35] arm: improve tests and fix vddupq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:35   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 05/35] arm: improve vidupq* tests Andrea Corallo
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vdwdupq_m_wb_u<mode>_insn): Fix spacing.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c : Improve test.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c : Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c : Likewise.
---
 gcc/config/arm/mve.md                         |  2 +-
 .../arm/mve/intrinsics/vdwdupq_m_n_u16.c      | 44 ++++++++++++++--
 .../arm/mve/intrinsics/vdwdupq_m_n_u32.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_m_n_u8.c       | 46 ++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_m_wb_u16.c     | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_m_wb_u32.c     | 48 +++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_m_wb_u8.c      | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_n_u16.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_n_u32.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_n_u8.c         | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_wb_u16.c       | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_wb_u32.c       | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_wb_u8.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/vdwdupq_x_n_u16.c      | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vdwdupq_x_n_u32.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_x_n_u8.c       | 46 ++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_x_wb_u16.c     | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_x_wb_u32.c     | 46 ++++++++++++++---
 .../arm/mve/intrinsics/vdwdupq_x_wb_u8.c      | 50 ++++++++++++++++---
 19 files changed, 655 insertions(+), 103 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 1215f845388..58ffe03c499 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9195,7 +9195,7 @@ (define_insn "mve_vdwdupq_m_wb_u<mode>_insn"
 	 VDWDUPQ_M))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;\tvdwdupt.u%#<V_sz_elem>\t%q2, %3, %R4, %5"
+  "vpst\;vdwdupt.u%#<V_sz_elem>\t%q2, %3, %R4, %5"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
index 5303fd7d361..8f53f5ef0cb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 1, p);
+  return vdwdupq_m_n_u16 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
index 9f22bd7f852..30e971fb733 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 4, p);
+  return vdwdupq_m_n_u32 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 4, p);
+  return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
index 0591e731958..0abc19a2318 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 4, p);
+  return vdwdupq_m_n_u8 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 4, p);
+  return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
index e4e7b47e082..b3e6affbf8f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint16x8_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 8, p);
+  return vdwdupq_m_wb_u16 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint16x8_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 8, p);
+  return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
index 42917dc9886..60c52b0d850 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32x4_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 1, p);
+  return vdwdupq_m_wb_u32 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32x4_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
index 32c3153ffb3..459321a7984 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint8x16_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 2, p);
+  return vdwdupq_m_wb_u8 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint8x16_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, a, b, 2, p);
+  return vdwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vdwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
index 725a6e4bc0e..9f76dbf35eb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, uint32_t b)
 {
-  return vdwdupq_n_u16 (a, b, 2);
+  return vdwdupq_n_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, uint32_t b)
 {
-  return vdwdupq_u16 (a, b, 2);
+  return vdwdupq_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vdwdupq_u16 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
index 6ceaadb984d..962f766b496 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32_t b)
 {
-  return vdwdupq_n_u32 (a, b, 8);
+  return vdwdupq_n_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, uint32_t b)
 {
-  return vdwdupq_u32 (a, b, 8);
+  return vdwdupq_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vdwdupq_u32 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
index a1712e418be..c73b1b69661 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, uint32_t b)
 {
-  return vdwdupq_n_u8 (a, b, 4);
+  return vdwdupq_n_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, uint32_t b)
 {
-  return vdwdupq_u8 (a, b, 4);
+  return vdwdupq_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vdwdupq_u8 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
index 0164ea9502c..3b1968d78aa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_wb_u16 (a, b, 2);
+  return vdwdupq_wb_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_u16 (a, b, 2);
+  return vdwdupq_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vdwdupq_u16 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
index 7681371b016..8554f62ee6b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_wb_u32 (a, b, 8);
+  return vdwdupq_wb_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_u32 (a, b, 8);
+  return vdwdupq_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vdwdupq_u32 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
index 6f60bb09b24..eb91a80daf5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_wb_u8 (a, b, 4);
+  return vdwdupq_wb_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t *a, uint32_t b)
 {
-  return vdwdupq_u8 (a, b, 4);
+  return vdwdupq_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vdwdupq_u8 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
index ce975267531..9c0fd1e253c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_x_n_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_x_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u16 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
index 9ed75d292d8..3107e2fdbbe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_n_u32 (a, b, 4, p);
+  return vdwdupq_x_n_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_u32 (a, b, 4, p);
+  return vdwdupq_x_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u32 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
index 3705094c4df..03d01e0dd43 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_n_u8 (a, b, 4, p);
+  return vdwdupq_x_n_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_u8 (a, b, 4, p);
+  return vdwdupq_x_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u8 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
index caf744d7255..f7dca660c03 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_wb_u16 (a, b, 8, p);
+  return vdwdupq_x_wb_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_u16 (a, b, 8, p);
+  return vdwdupq_x_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u16 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
index 8c8be86bce6..032ae94e8c3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_x_wb_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
   return vdwdupq_x_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u32 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
index 1c6ef4ed33f..5d238a7a865 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_wb_u8 (a, b, 2, p);
+  return vdwdupq_x_wb_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return vdwdupq_x_u8 (a, b, 2, p);
+  return vdwdupq_x_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return vdwdupq_x_u8 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 05/35] arm: improve vidupq* tests
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (3 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 04/35] arm: improve tests and fix vdwdupq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:36   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 06/35] arm: improve tests and fix vdupq* Andrea Corallo
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vidupq_m_n_u16.c       | 46 +++++++++++++---
 .../arm/mve/intrinsics/vidupq_m_n_u32.c       | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_m_n_u8.c        | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_m_wb_u16.c      | 46 +++++++++++++---
 .../arm/mve/intrinsics/vidupq_m_wb_u32.c      | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_m_wb_u8.c       | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_n_u16.c         | 32 ++++++++++--
 .../arm/mve/intrinsics/vidupq_n_u32.c         | 28 +++++++++-
 .../arm/mve/intrinsics/vidupq_n_u8.c          | 28 +++++++++-
 .../arm/mve/intrinsics/vidupq_wb_u16.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/vidupq_wb_u32.c        | 28 +++++++++-
 .../arm/mve/intrinsics/vidupq_wb_u8.c         | 28 +++++++++-
 .../arm/mve/intrinsics/vidupq_x_n_u16.c       | 46 +++++++++++++---
 .../arm/mve/intrinsics/vidupq_x_n_u32.c       | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_x_n_u8.c        | 42 +++++++++++++--
 .../arm/mve/intrinsics/vidupq_x_wb_u16.c      | 52 +++++++++++++++----
 .../arm/mve/intrinsics/vidupq_x_wb_u32.c      | 52 +++++++++++++++----
 .../arm/mve/intrinsics/vidupq_x_wb_u8.c       | 52 +++++++++++++++----
 18 files changed, 634 insertions(+), 88 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
index 822d41197e6..b4ee7af36e3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vidupq_m_n_u16 (inactive, a, 4, p);
+  return vidupq_m_n_u16 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
 {
-  return vidupq_m (inactive, a, 4, p);
+  return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
index c01826e15dc..b13a7a80dcb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vidupq_m_n_u32 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
index e269665813c..b731002724a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vidupq_m_n_u8 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
index 8d21bc7db80..0e2ad6a2b55 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vidupq_m_wb_u16 (inactive, a, 4, p);
+  return vidupq_m_wb_u16 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vidupq_m (inactive, a, 4, p);
+  return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
index e7bc06cd826..786a05eee35 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vidupq_m_wb_u32 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
index a8a2f9a1c49..3fcc3ba0d67 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vidupq_m_wb_u8 (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
 {
   return vidupq_m (inactive, a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vidupq_m (inactive, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
index c59ca1ebf74..a6ffdc05ce5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a)
 {
-  return vidupq_n_u16 (a, 4);
+  return vidupq_n_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a)
 {
-  return vidupq_u16 (a, 4);
+  return vidupq_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vidupq_u16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
index 7e835e0868c..8cd43e38255 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a)
 {
   return vidupq_n_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a)
 {
   return vidupq_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vidupq_u32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
index 06d1a1a1480..4005eabb45d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a)
 {
   return vidupq_n_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a)
 {
   return vidupq_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vidupq_u8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
index 1cb0ded198f..3ad89c0536c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t *a)
 {
-  return vidupq_wb_u16 (a, 4);
+  return vidupq_wb_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t *a)
 {
-  return vidupq_u16 (a, 4);
+  return vidupq_u16 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return vidupq_u16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
index e5d9c5327fb..45eb1b09a5b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t *a)
 {
   return vidupq_wb_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t *a)
 {
   return vidupq_u32 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return vidupq_u32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
index 57e1bb46776..beb0aae67a9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t *a)
 {
   return vidupq_wb_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t *a)
 {
   return vidupq_u8 (a, 1);
 }
 
-/* { dg-final { scan-assembler "vidup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return vidupq_u8 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
index bdf8ec2b047..74cd4310213 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, mve_pred16_t p)
 {
-  return vidupq_x_n_u16 (a, 4, p);
+  return vidupq_x_n_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
-  return vidupq_x_u16 (a, 4, p);
+  return vidupq_x_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return vidupq_x_u16 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
index 8be549cb446..3111b1a54e6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, mve_pred16_t p)
 {
   return vidupq_x_n_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
   return vidupq_x_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return vidupq_x_u32 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
index 1e1975017de..5bedb4f9e79 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, mve_pred16_t p)
 {
   return vidupq_x_n_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, mve_pred16_t p)
 {
   return vidupq_x_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return vidupq_x_u8 (1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
index 31197a76cfa..caf334fa32f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t *a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vidupq_x_wb_u16 (a, 8, p);
+  return vidupq_x_wb_u16 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vidupq_x_u16 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vidupq_x_u16 (a, 8, p);
+  return vidupq_x_u16 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
index cef56f133e8..11895e303cf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t *a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vidupq_x_wb_u32 (a, 2, p);
+  return vidupq_x_wb_u32 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vidupq_x_u32 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vidupq_x_u32 (a, 2, p);
+  return vidupq_x_u32 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
index 0403ba1174c..b951d4cfe94 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
@@ -1,25 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
-uint32_t * a;
-
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (mve_pred16_t p)
+foo (uint32_t *a, mve_pred16_t p)
 {
-  return vidupq_x_wb_u8 (a, 2, p);
+  return vidupq_x_wb_u8 (a, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint32_t *a, mve_pred16_t p)
+{
+  return vidupq_x_u8 (a, 1, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (mve_pred16_t p)
+foo2 (mve_pred16_t p)
 {
-  return vidupq_x_u8 (a, 2, p);
+  return vidupq_x_u8 (1, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vidupt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 06/35] arm: improve tests and fix vdupq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (4 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 05/35] arm: improve vidupq* tests Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:37   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 07/35] arm: improve tests and fix vcmp* Andrea Corallo
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vdupq_n_f<mode>)
	(mve_vdupq_n_<supf><mode>, mve_vdupq_m_n_<supf><mode>)
	(mve_vdupq_m_n_f<mode>): Fix spacing.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c: Likewise.
---
 gcc/config/arm/mve.md                         |  8 ++--
 .../arm/mve/intrinsics/vdupq_m_n_f16.c        | 41 +++++++++++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_f32.c        | 41 +++++++++++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_s16.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_s32.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_s8.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_u16.c        | 41 +++++++++++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_u32.c        | 41 +++++++++++++++++--
 .../arm/mve/intrinsics/vdupq_m_n_u8.c         | 41 +++++++++++++++++--
 .../arm/mve/intrinsics/vdupq_n_f16.c          | 21 +++++++++-
 .../arm/mve/intrinsics/vdupq_n_f32.c          | 21 +++++++++-
 .../arm/mve/intrinsics/vdupq_n_s16.c          | 13 ++++--
 .../arm/mve/intrinsics/vdupq_n_s32.c          | 13 ++++--
 .../arm/mve/intrinsics/vdupq_n_s8.c           |  9 +++-
 .../arm/mve/intrinsics/vdupq_n_u16.c          | 23 ++++++++++-
 .../arm/mve/intrinsics/vdupq_n_u32.c          | 23 ++++++++++-
 .../arm/mve/intrinsics/vdupq_n_u8.c           | 23 ++++++++++-
 .../arm/mve/intrinsics/vdupq_x_n_f16.c        | 30 +++++++++++++-
 .../arm/mve/intrinsics/vdupq_x_n_f32.c        | 30 +++++++++++++-
 .../arm/mve/intrinsics/vdupq_x_n_s16.c        | 14 ++++++-
 .../arm/mve/intrinsics/vdupq_x_n_s32.c        | 14 ++++++-
 .../arm/mve/intrinsics/vdupq_x_n_s8.c         | 14 ++++++-
 .../arm/mve/intrinsics/vdupq_x_n_u16.c        | 30 +++++++++++++-
 .../arm/mve/intrinsics/vdupq_x_n_u32.c        | 30 +++++++++++++-
 .../arm/mve/intrinsics/vdupq_x_n_u8.c         | 30 +++++++++++++-
 25 files changed, 567 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 58ffe03c499..6d5270281ec 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -266,7 +266,7 @@ (define_insn "mve_vdupq_n_f<mode>"
 	 VDUPQ_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vdup.%#<V_sz_elem>   %q0, %1"
+  "vdup.%#<V_sz_elem>\t%q0, %1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -435,7 +435,7 @@ (define_insn "mve_vdupq_n_<supf><mode>"
 	 VDUPQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vdup.%#<V_sz_elem>   %q0, %1"
+  "vdup.%#<V_sz_elem>\t%q0, %1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3046,7 +3046,7 @@ (define_insn "mve_vdupq_m_n_<supf><mode>"
 	 VDUPQ_M_N))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vdupt.%#<V_sz_elem>	%q0, %2"
+  "vpst\;vdupt.%#<V_sz_elem>\t%q0, %2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -3991,7 +3991,7 @@ (define_insn "mve_vdupq_m_n_f<mode>"
 	 VDUPQ_M_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vdupt.%#<V_sz_elem>	%q0, %2"
+  "vpst\;vdupt.%#<V_sz_elem>\t%q0, %2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
index 0b749be3527..bfa471bcb31 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16_t a, mve_pred16_t p)
 {
   return vdupq_m_n_f16 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t inactive, mve_pred16_t p)
+{
+  return vdupq_m (inactive, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
index 9cca5310c7a..e1dd8f58ad0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32_t a, mve_pred16_t p)
 {
   return vdupq_m_n_f32 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t inactive, mve_pred16_t p)
+{
+  return vdupq_m (inactive, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
index b521f13e94f..52304ace03a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16_t a, mve_pred16_t p)
 {
   return vdupq_m_n_s16 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
index 96aa195dc18..44a80c5d5bc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32_t a, mve_pred16_t p)
 {
   return vdupq_m_n_s32 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
index f1d222000c1..1630a3b9234 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8_t a, mve_pred16_t p)
 {
   return vdupq_m_n_s8 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
index 39d0c9f502d..d3df8b69248 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16_t a, mve_pred16_t p)
 {
   return vdupq_m_n_u16 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return vdupq_m (inactive, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
index fc107172e16..e6bb0cc2c38 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vdupq_m_n_u32 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return vdupq_m (inactive, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
index 9fd3bc443cb..ad6f6d04ae3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8_t a, mve_pred16_t p)
 {
   return vdupq_m_n_u8 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8_t a, mve_pred16_t p)
 {
   return vdupq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return vdupq_m (inactive, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
index 62bfc194533..fc5a7933653 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
@@ -1,13 +1,32 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16_t a)
 {
   return vdupq_n_f16 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.16"  }  } */
+/*
+**foo1:
+**	...
+**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 ()
+{
+  return vdupq_n_f16 (1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
index f5ad2286d8d..a6be82e5927 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
@@ -1,13 +1,32 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32_t a)
 {
   return vdupq_n_f32 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.32"  }  } */
+/*
+**foo1:
+**	...
+**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 ()
+{
+  return vdupq_n_f32 (1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
index 1378522a18e..f842b96c3b1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
@@ -1,13 +1,20 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16_t a)
 {
   return vdupq_n_s16 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
index 43affe856c0..05cbff8fdae 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
@@ -1,13 +1,20 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32_t a)
 {
   return vdupq_n_s32 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
index 3f934dc5d59..1d141161604 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
@@ -1,13 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8_t a)
 {
   return vdupq_n_s8 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
index 93268643fec..4839d427e65 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
@@ -1,13 +1,32 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16_t a)
 {
-    return vdupq_n_u16 (a);
+  return vdupq_n_u16 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.16"  }  } */
+/*
+**foo1:
+**	...
+**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 ()
+{
+  return vdupq_n_u16 (1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
index 276e9ddc67f..f0069eb7280 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
@@ -1,13 +1,32 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a)
 {
-    return vdupq_n_u32 (a);
+  return vdupq_n_u32 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.32"  }  } */
+/*
+**foo1:
+**	...
+**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 ()
+{
+  return vdupq_n_u32 (1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
index d0361c15047..fe26687ae45 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
@@ -1,13 +1,32 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8_t a)
 {
-    return vdupq_n_u8 (a);
+  return vdupq_n_u8 (a);
 }
 
-/* { dg-final { scan-assembler "vdup.8"  }  } */
+/*
+**foo1:
+**	...
+**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 ()
+{
+  return vdupq_n_u8 (1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
index c91ee62791c..11ebb47f94f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
@@ -1,14 +1,40 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16_t a, mve_pred16_t p)
 {
   return vdupq_x_n_f16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (mve_pred16_t p)
+{
+  return vdupq_x_n_f16 (1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
index c2b39051f5b..4e79bd54f71 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
@@ -1,14 +1,40 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32_t a, mve_pred16_t p)
 {
   return vdupq_x_n_f32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (mve_pred16_t p)
+{
+  return vdupq_x_n_f32 (1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
index cc8a5bfeca1..90288777df7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
@@ -1,14 +1,24 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16_t a, mve_pred16_t p)
 {
   return vdupq_x_n_s16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
index b3ed3eb68e8..c4c906e0682 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
@@ -1,14 +1,24 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32_t a, mve_pred16_t p)
 {
   return vdupq_x_n_s32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
index 3be865dcc84..6234730827e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
@@ -1,14 +1,24 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8_t a, mve_pred16_t p)
 {
   return vdupq_x_n_s8 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
index d01338aeb91..821fcddcab1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
@@ -1,14 +1,40 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16_t a, mve_pred16_t p)
 {
   return vdupq_x_n_u16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.16"  }  } */
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (mve_pred16_t p)
+{
+  return vdupq_x_n_u16 (1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
index 8fa7d4552bc..20125df6226 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
@@ -1,14 +1,40 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, mve_pred16_t p)
 {
   return vdupq_x_n_u32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.32"  }  } */
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (mve_pred16_t p)
+{
+  return vdupq_x_n_u32 (1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
index 96ad899c9c2..defaaeebfcf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
@@ -1,14 +1,40 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8_t a, mve_pred16_t p)
 {
   return vdupq_x_n_u8 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vdupt.8"  }  } */
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (mve_pred16_t p)
+{
+  return vdupq_x_n_u8 (1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 07/35] arm: improve tests and fix vcmp*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (5 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 06/35] arm: improve tests and fix vdupq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:40   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 08/35] arm: improve tests for vmin* Andrea Corallo
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Fix
	spacing.
	* config/arm/arm_mve.h (__arm_vcmpgtq_m, __arm_vcmpleq_m)
	(__arm_vcmpltq_m, __arm_vcmpneq_m): Add missing defines.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u8.c: Likewise.
---
 gcc/config/arm/arm_mve.h                      | 47 +++++++++++++++++++
 gcc/config/arm/mve.md                         |  2 +-
 .../arm/mve/intrinsics/vcmpcsq_m_n_u16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_m_n_u32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_m_n_u8.c       | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_m_u16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_m_u32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_m_u8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpcsq_n_u16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpcsq_n_u32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpcsq_n_u8.c         | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpcsq_u16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpcsq_u32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpcsq_u8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_u16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_u32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_n_u8.c       | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_u16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_u32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_m_u8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpeqq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_u16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_u32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpeqq_n_u8.c         | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpeqq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_u16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_u32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpeqq_u8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgeq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpgeq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpgeq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgeq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpgtq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpgtq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpgtq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpgtq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmphiq_m_n_u16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmphiq_m_n_u32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmphiq_m_n_u8.c       | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmphiq_m_u16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmphiq_m_u32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmphiq_m_u8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmphiq_n_u16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmphiq_n_u32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmphiq_n_u8.c         | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmphiq_u16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmphiq_u32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmphiq_u8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpleq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpleq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpleq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpleq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpltq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpltq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpltq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpltq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_f16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_f32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_m_f16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_f32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_f16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_f32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_s16.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_s32.c      | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_s8.c       | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_u16.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_u32.c      | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_n_u8.c       | 47 +++++++++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_s16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_s32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_s8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_u16.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_u32.c        | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_m_u8.c         | 29 ++++++++++--
 .../arm/mve/intrinsics/vcmpneq_n_f16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpneq_n_f32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpneq_n_s16.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_n_s32.c        | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_n_s8.c         | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_n_u16.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpneq_n_u32.c        | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpneq_n_u8.c         | 34 +++++++++++++-
 .../arm/mve/intrinsics/vcmpneq_s16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_s32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_s8.c           | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_u16.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_u32.c          | 20 +++++++-
 .../arm/mve/intrinsics/vcmpneq_u8.c           | 20 +++++++-
 170 files changed, 4512 insertions(+), 421 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 073e3711623..684f997520f 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -39229,6 +39229,53 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
 
+
+#define __arm_vcmpgtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+
+#define __arm_vcmpleq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpleq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpleq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+
+#define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpltq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpltq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+
+#define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpneq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
+
 #define __arm_vdupq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 6d5270281ec..3330a220aea 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -831,7 +831,7 @@ (define_insn "@mve_vcmp<mve_cmp_op>q_<mode>"
 		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vcmp.<mve_cmp_type>%#<V_sz_elem>  <mve_cmp_op>, %q1, %q2"
+  "vcmp.<mve_cmp_type>%#<V_sz_elem>\t<mve_cmp_op>, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
index a1640133012..de9fe5e7d01 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vcmpcsq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
index d269ec7e3ab..04df1b2dc61 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vcmpcsq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
index 52c16b3e70f..34ebadca248 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vcmpcsq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
index e68afa316a9..bc03bf687de 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
index 05d1b21b279..8e216d49a02 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
index 4c8a9d0aa2c..ac4196a2e48 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpcsq_m_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpcsq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
index 4124036003e..6038f4c8c65 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vcmpcsq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a)
+{
+  return vcmpcsq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
index 463c1ee12b4..9f39aa761c8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vcmpcsq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a)
+{
+  return vcmpcsq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
index 92bc44a4bb6..0ce2cd13a7b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vcmpcsq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a)
+{
+  return vcmpcsq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
index 26c7d750cef..5598d06875c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vcmpcsq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
index c91b0e1c2e3..99b232b05dd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vcmpcsq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
index 51ddab91500..571e57135ab 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vcmpcsq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vcmpcsq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
index 556351f4984..57b276a1d4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpeqq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
index 65b2f240520..ab1b25e2888 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpeqq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
index 91b0ffa0afd..c5587884d0e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
index d66e9c8be34..4e9675fff51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
index 46b3f4499d3..a3cae828e79 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
index 7d672c129db..a7ce9e0c7e3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
index 912d4ad893d..7ba481e169f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
index 947c331622d..13c88eaabb5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
index e215d655ea2..dcf276dee44 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
index ea4716c450e..d59d5149a30 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
index 489c6ec0cb3..1fbf385d030 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
index e8dfce432d1..92758c98c9a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
index 7e4c141e5d2..1ea35ed924b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
index 904cfb6fe37..a9bc9733842 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
index a7e12164e32..a9fe771a101 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
index 283e1fd036e..826901874d7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
index ad1739bd609..512b7f9c889 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
index 595142e9cda..01b4507ba63 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpeqq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
index f97209d2322..cf2812558ff 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpeqq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpeqq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
index c80843288b2..13817174282 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpeqq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpeqq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
index 69f1f531af4..bd29828492e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpeqq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
index 06032dbcc20..2a0d84e9b51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpeqq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
index 3ebd88be85b..524bbe9f3cb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpeqq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
index 2f6c53a525e..3eeaa49aa97 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vcmpeqq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a)
+{
+  return vcmpeqq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
index 22fb5be97c5..a881bb841af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vcmpeqq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a)
+{
+  return vcmpeqq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
index 79eaeed6950..429b2e35eb7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vcmpeqq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a)
+{
+  return vcmpeqq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
index 7951ead8a31..92a87c08773 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpeqq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
index 659ccb4ac14..d3b87d59bfa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpeqq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
index 9282ec2a97a..2b71bbf75f6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpeqq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
index 318b7aa9306..1830b667bb6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vcmpeqq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
index 88e015f1fa3..2b2a5f920f3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vcmpeqq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
index 990a96f7b3f..9450c203394 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vcmpeqq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vcmpeqq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
index eea63a2fe50..fd8bcab4f25 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpgeq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
index 64243fe3e8c..a2d50b580e7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpgeq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
index 3588b0a536f..a631825fadd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
index 8ed1d22e919..b94e0738ef0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
index d106af8f53b..9f4903d9cfd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpgeq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
index 1feef8adb7f..679e644f165 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpgeq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
index c0ad38f6c6f..45e26d0a77b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
index 8974ce4d11a..3a6cad921f2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
index 981aa1b516c..ce1ca30d6ea 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
index 587432a6af1..51587a38b72 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
index e460a8dcafc..3ff0aaaa414 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
index cde28a314b9..df71ee57945 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpgeq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
index 907fa5d50f6..2ca1b9d6684 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpgeq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpgeq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
index e4d1406c049..3af110bd2b2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpgeq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpgeq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
index f4aad09e783..3c1af8a93ab 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpgeq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
index 2baa5204819..8b4e0f426e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpgeq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
index 1dcffcc3050..c1669bcdd90 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpgeq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
index 817ffb2d8ac..593c7410dcb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpgeq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
index d608b7fc9cf..9e26ea9938a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpgeq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
index 506e6cede95..3cb2832e159 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpgeq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpgeq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
index e2bfd7ed156..8835fe08dba 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpgtq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
index 1b4433f0e76..e1470884708 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpgtq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
index def3f90a79d..cb9d5f4036f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
index 41a11563f36..b249b831782 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
index 80c86f65825..b375983f01e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpgtq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
index 9b7aaadfe71..208a285cb39 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpgtq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
index c0719d0110c..248e3093d2a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
index 26df8cea9fc..9843288296e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
index f20c50d69c1..80f1aa9ead0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
index da97abceb2e..9289c00b5af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
index ab7c218c7af..8a3d7606bb7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
index 13520d1067b..2760795eb86 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpgtq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
index 98e152cd999..9f2a4be319a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpgtq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpgtq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
index 5691e2f9d35..bbf18ebe6e7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpgtq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpgtq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
index bc3bdbae2da..d833cb6f58e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpgtq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
index 409a3f9d808..28cd51b9582 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpgtq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
index 2624307be9d..5a953ca55f4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpgtq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
index be19e19f09f..b9c9da486f5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpgtq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
index 95f6c703b9d..0f79385358e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpgtq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
index 8ba180d8e39..f59dad94a57 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpgtq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpgtq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
index 26e5fe3f900..136a2e44259 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmphiq_m_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vcmphiq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
index 51396b8d0cd..5640b97afaf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmphiq_m_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vcmphiq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
index 475f2e82345..e6474e45487 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmphiq_m_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vcmphiq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
index 98ba895fde0..38b9b90c803 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmphiq_m_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
index ee561b02d0c..97c8c1dfe05 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmphiq_m_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
index 0c5b29e2673..e2024ccda25 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmphiq_m_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmphiq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
index d39b755441d..36107fc7b8d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vcmphiq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a)
+{
+  return vcmphiq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
index dbedea9b078..d34de8f65c7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vcmphiq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a)
+{
+  return vcmphiq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
index 967bb206886..93a05b1a857 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vcmphiq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a)
+{
+  return vcmphiq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
index f9399498a99..40e65dc52f4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vcmphiq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
index becdef0696a..d87a4185762 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vcmphiq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
index 933cc69507d..80fd2a40b0f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vcmphiq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vcmphiq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
index c2e69a5de92..209d81096af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpleq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
index 923aee050d3..b92c5f66fd9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpleq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
index 66a37192985..e6136898ded 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpleq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
index e679b338d58..2304e98d253 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpleq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
index 42049fd57a4..a61db2817c1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpleq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpleq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
index c68bd4e5900..7a2cdb4059d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpleq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpleq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
index 0cdc14455a3..69fcab15b8a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpleq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
index a955af8fa2b..617ebd6144f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpleq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
index d9951e4a8cf..b8ee50dd55c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpleq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
index f16aff86ef0..fcc376d6ec3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpleq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
index 2c4e659e9cf..9983e89d80c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpleq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
index 69b88cfb389..504e4feb5d1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpleq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpleq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
index 3fa3c5e0310..cfa6dbc07c7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpleq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpleq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
index 8349de7b68c..c89558f4076 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpleq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpleq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
index 5ecae572227..da73fc14b77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpleq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
index 02320e7a552..0951a5c13fb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpleq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
index a0ac97328b7..e4553354681 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpleq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
index 2fb4acd3d74..68500da9ddf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpleq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
index 2ae998efb7c..1966bcd94d3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpleq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
index da06b019cc1..e9f6e47e5d6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpleq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpleq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
index eab80b2ddd9..b4958816bd8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpltq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
index f17d16482dd..752ab2b3e49 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpltq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
index 93c36f3a613..cbaacbe2b47 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpltq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
index a17f0b02a95..96d0e7c7cc6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpltq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
index 45d0f51b4d7..1e5db53198e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpltq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpltq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
index 16e37ccaf8d..77de40ade01 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpltq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpltq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
index d0e322fbede..beebe65a58f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpltq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
index 7ec7963267a..07260c56ed3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpltq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
index 22434e88cd6..7d1e9e7fbde 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpltq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
index 359c0640784..c0f6dfc9432 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpltq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
index 3df7e89a6f5..b6fc4700e73 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpltq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
index 1055c2b661c..545b76359ad 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpltq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpltq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
index 2d55af20dd3..401ef21ba2b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpltq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpltq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
index 2590ca83c45..380f071e564 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpltq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpltq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
index 169f6ad4610..a1d12392dd2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpltq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
index 534047c2df3..6332f75f327 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpltq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
index da659f1f2be..e0ac80caeb0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpltq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
index da4c90a07de..23843ad88f3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpltq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
index 5dc218a5f40..aeb7a6f9896 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpltq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
index ea5853c212c..2129b56a5f7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpltq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
index 8d1c6096c56..c27ea2f0de8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vcmpneq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
index 860bd69c129..609de44d8e7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vcmpneq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
index a4e62de7272..98f22337d61 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
index b18a2e5fd88..7f6e96ae47e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
index c127b3a68f6..71b3476fb18 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vcmpneq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
index a8423d45708..d6dea8db865 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vcmpneq_m (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
index 63ee1c3bffb..e72c9b62829 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
index 10f6d448d76..47c90e31f49 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
index 66e5d158c51..9d9da100046 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
index ffe6ff919cf..ea8cf24b358 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vcmpneq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
index 55e796a1138..30291dcdd9b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vcmpneq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
index 3c8bd16647a..be75376a691 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
@@ -1,22 +1,63 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpneq_m_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vcmpneq_m (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
index d3e1ce0e690..60e868141d0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
index f5602ffd0da..780c544bef3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
index 84b8b1617b0..15f6d316cba 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpneq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
index 3c8943719bb..300852ed7b3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
index 980cc4124b2..227b5f01eca 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
index 2615dcb37b9..cfcb59f49cf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpneq_m_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vcmpneq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
index e9e2a9c7b04..29e43f3fdf8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float16x8_t a, float16_t b)
 {
   return vcmpneq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float16x8_t a)
+{
+  return vcmpneq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
index eb64b17969c..688e77cd044 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (float32x4_t a, float32_t b)
 {
   return vcmpneq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (float32x4_t a)
+{
+  return vcmpneq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
index 14689242ee4..2afc34d16e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16_t b)
 {
   return vcmpneq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
index 53418ff3923..6c323161316 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32_t b)
 {
   return vcmpneq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
index fa405c281b4..5483d6dd2fe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8_t b)
 {
   return vcmpneq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
index cc8540b3a6c..d8edfb0d825 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vcmpneq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint16x8_t a)
+{
+  return vcmpneq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
index 07c9b1ade96..2b7a6b56830 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vcmpneq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint32x4_t a)
+{
+  return vcmpneq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
index eac5e96384e..2dab43af331 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
@@ -1,21 +1,51 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vcmpneq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
+mve_pred16_t
+foo2 (uint8x16_t a)
+{
+  return vcmpneq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
index 6b04ce70ffc..d57b607baa9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vcmpneq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
index cfb98d7e650..e02171f6686 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vcmpneq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
index ae69be4ba0b..0abef8c3e00 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vcmpneq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
index 51059f21191..7144f3ee2fc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vcmpneq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
index 42e4a3f4f2d..a31134f2f1d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vcmpneq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
index addacc15833..2801c8e3763 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
@@ -1,21 +1,37 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vcmpneq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
+**	...
+*/
 mve_pred16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vcmpneq (a, b);
 }
 
-/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 08/35] arm: improve tests for vmin*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (6 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 07/35] arm: improve tests and fix vcmp* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:41   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 09/35] arm: improve tests for vmax* Andrea Corallo
                   ` (27 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vminaq_m_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vminaq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vminaq_m_s16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vminaq_m_s32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vminaq_m_s8.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vminaq_s16.c           | 16 +++++++-
 .../arm/mve/intrinsics/vminaq_s32.c           | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminaq_s8.c | 16 +++++++-
 .../arm/mve/intrinsics/vminavq_p_s16.c        | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vminavq_p_s32.c        | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vminavq_p_s8.c         | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vminavq_s16.c          | 29 ++++++++++---
 .../arm/mve/intrinsics/vminavq_s32.c          | 29 ++++++++++---
 .../arm/mve/intrinsics/vminavq_s8.c           | 29 ++++++++++---
 .../arm/mve/intrinsics/vminnmaq_f16.c         | 16 +++++++-
 .../arm/mve/intrinsics/vminnmaq_f32.c         | 16 +++++++-
 .../arm/mve/intrinsics/vminnmaq_m_f16.c       | 25 +++++++++--
 .../arm/mve/intrinsics/vminnmaq_m_f32.c       | 25 +++++++++--
 .../arm/mve/intrinsics/vminnmavq_f16.c        | 27 +++++++++---
 .../arm/mve/intrinsics/vminnmavq_f32.c        | 27 +++++++++---
 .../arm/mve/intrinsics/vminnmavq_p_f16.c      | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminnmavq_p_f32.c      | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminnmq_f16.c          | 16 +++++++-
 .../arm/mve/intrinsics/vminnmq_f32.c          | 16 +++++++-
 .../arm/mve/intrinsics/vminnmq_m_f16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vminnmq_m_f32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vminnmq_x_f16.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vminnmq_x_f32.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vminnmvq_f16.c         | 27 +++++++++---
 .../arm/mve/intrinsics/vminnmvq_f32.c         | 27 +++++++++---
 .../arm/mve/intrinsics/vminnmvq_p_f16.c       | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminnmvq_p_f32.c       | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminq_m_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vminq_m_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vminq_m_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vminq_m_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vminq_m_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vminq_m_u8.c           | 26 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vminq_s16.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminq_s32.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminq_s8.c  | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminq_u16.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminq_u32.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vminq_u8.c  | 16 +++++++-
 .../arm/mve/intrinsics/vminq_x_s16.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vminq_x_s32.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vminq_x_s8.c           | 25 +++++++++--
 .../arm/mve/intrinsics/vminq_x_u16.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vminq_x_u32.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vminq_x_u8.c           | 25 +++++++++--
 .../arm/mve/intrinsics/vminvq_p_s16.c         | 31 ++++++++++----
 .../arm/mve/intrinsics/vminvq_p_s32.c         | 31 ++++++++++----
 .../arm/mve/intrinsics/vminvq_p_s8.c          | 31 ++++++++++----
 .../arm/mve/intrinsics/vminvq_p_u16.c         | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminvq_p_u32.c         | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminvq_p_u8.c          | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vminvq_s16.c           | 22 ++++++----
 .../arm/mve/intrinsics/vminvq_s32.c           | 22 ++++++----
 .../gcc.target/arm/mve/intrinsics/vminvq_s8.c | 22 ++++++----
 .../arm/mve/intrinsics/vminvq_u16.c           | 29 ++++++++++---
 .../arm/mve/intrinsics/vminvq_u32.c           | 26 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vminvq_u8.c | 29 ++++++++++---
 60 files changed, 1320 insertions(+), 255 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
index 0324110c6a8..925b9154ca7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminaq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminat.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
index a2886d4f40f..296f69dfcda 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminaq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
index 95eb038efc0..cf6fecc3461 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminaq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminat.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
index 3a157e00a27..63f59f8c80a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmina.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b)
 {
   return vminaq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmina.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b)
 {
   return vminaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
index 5c732c65d63..eb0a54cbe19 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmina.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b)
 {
   return vminaq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmina.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b)
 {
   return vminaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
index 2e4dad141ce..b875308863d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmina.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b)
 {
   return vminaq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmina.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b)
 {
   return vminaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmina.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
index 9303ae02e39..5d3c40fb1fc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, int16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint16_t a, int16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminavq_p (a, b, p);
 }
 
-
-int16_t
-foo2 (uint8_t a, int16x8_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16_t
+foo2 (int16x8_t b, mve_pred16_t p)
 {
-  return vminavq_p (a, b, p);
+  return vminavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminavt.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
index 36247f68b2c..ee4ff251d63 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint32_t a, int32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminavq_p (a, b, p);
 }
 
-
-int32_t
-foo2 (uint16_t a, int32x4_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b, mve_pred16_t p)
 {
-  return vminavq_p (a, b, p);
+  return vminavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminavt.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
index d3361615dcc..14602c29719 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint8_t a, int8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminavq_p (a, b, p);
 }
 
-
-int8_t
-foo2 (uint32_t a, int8x16_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8_t
+foo2 (int8x16_t b, mve_pred16_t p)
 {
-  return vminavq_p (a, b, p);
+  return vminavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminavt.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
index 17e4edca2f1..51f75ae1f6a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, int16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (uint16_t a, int16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, int16x8_t b)
 {
   return vminavq (a, b);
 }
 
-
-int16_t
-foo2 (uint8_t a, int16x8_t b)
+/*
+**foo2:
+**	...
+**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16_t
+foo2 (int16x8_t b)
 {
-  return vminavq (a, b);
+  return vminavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminav.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
index 032d02b8857..d1602cebe18 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (uint32_t a, int32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b)
 {
   return vminavq (a, b);
 }
 
-
-int32_t
-foo2 (uint16_t a, int32x4_t b)
+/*
+**foo2:
+**	...
+**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b)
 {
-  return vminavq (a, b);
+  return vminavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminav.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
index 2a2bb3d6146..f4c9b045b90 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, int8x16_t b)
 {
@@ -11,18 +18,28 @@ foo (uint8_t a, int8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, int8x16_t b)
 {
   return vminavq (a, b);
 }
 
-
-int8_t
-foo2 (uint32_t a, int8x16_t b)
+/*
+**foo2:
+**	...
+**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8_t
+foo2 (int8x16_t b)
 {
-  return vminavq (a, b);
+  return vminavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminav.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
index cf32186d642..1728d104266 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vminnmaq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnma.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vminnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vminnmaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnma.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
index 1c3f19c9e1b..42b4265d9cc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vminnmaq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnma.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vminnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vminnmaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnma.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
index 4423903e913..51b85bd2b04 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmaq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmat.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
index 683f40ad3d8..2f0423ecb4f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmaq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmat.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
index fadb23e05c8..17e4ad16759 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b)
 {
   return vminnmavq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b)
+foo2 (float16x8_t b)
 {
-  return vminnmavq (a, b);
+  return vminnmavq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmav.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
index 84714a96b9f..2758e59666e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b)
 {
   return vminnmavq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b)
+foo2 (float32x4_t b)
 {
-  return vminnmavq (a, b);
+  return vminnmavq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmav.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
index c79fa307ae0..b60a6627aea 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmavq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
+foo2 (float16x8_t b, mve_pred16_t p)
 {
-  return vminnmavq_p (a, b, p);
+  return vminnmavq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmavt.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
index bea04c7aac6..6fa97b74a65 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmavq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
+foo2 (float32x4_t b, mve_pred16_t p)
 {
-  return vminnmavq_p (a, b, p);
+  return vminnmavq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmavt.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
index 18d4a4c1330..c0962b52631 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vminnmq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnm.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vminnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vminnmq (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnm.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
index 34144cad17f..a9c3e5f74b1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vminnmq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnm.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vminnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vminnmq (a, b);
 }
 
-/* { dg-final { scan-assembler "vminnm.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
index e5533d28035..466264249c5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
index 382d16c4489..57edc8e1a80 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
index 04d606ce5cd..73b4ccba080 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
index 87cd970fd11..9a824566212 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
index 0eb3a4af14e..dc00d02df7d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b)
 {
   return vminnmvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b)
+foo2 (float16x8_t b)
 {
-  return vminnmvq (a, b);
+  return vminnmvq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmv.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
index f3183508f8e..ff23c818452 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b)
 {
   return vminnmvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b)
+foo2 (float32x4_t b)
 {
-  return vminnmvq (a, b);
+  return vminnmvq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmv.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
index 16f6ac514c8..ad99f586d11 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
 {
   return vminnmvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
+foo2 (float16x8_t b, mve_pred16_t p)
 {
-  return vminnmvq_p (a, b, p);
+  return vminnmvq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmvt.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
index a8e4f9ffba7..3c7e5c07a68 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
 {
   return vminnmvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
+foo2 (float32x4_t b, mve_pred16_t p)
 {
-  return vminnmvq_p (a, b, p);
+  return vminnmvq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminnmvt.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
index f257ddcf600..fe7368eeb38 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
index 957da71d0e3..a90a1db8835 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
index fea8bfd7994..911bd3af0dc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
index 7cc19a7dd5d..f80288aaf79 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vminq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
index 301fbfc751f..b480089f4f3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vminq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
index 7a65b3557a3..73633c9612e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vminq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vminq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
index d46a3c4ee18..eb34dc4c41c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vminq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
index 601e918a5bf..60d29da4e14 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vminq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
index e2ae2341ad8..675fb8edfb1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vminq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
index 3cac573f6ef..50f648d5133 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vminq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
index ca3ef245fe9..bcfead39c5a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vminq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
index b7ef4db22ff..e8eacae4da8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmin.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vminq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmin.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vminq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmin.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
index af93c78658e..0d8987e16b8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
index 76f0831e48e..3c3595171ea 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
index fdd6e94497c..402c4aa121d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
index 9842954c761..e27a3416e38 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vminq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
index 741e4508879..d3cb29bf60c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vminq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
index 13743fc87a1..3e05ef7dd13 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vminq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmint.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vminq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
index 91bb63f6ba6..7c25c9d2f82 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo (int16_t a, int16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int16_t a, int16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo1 (int16_t a, int16x8_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
-int16_t
-foo2 (int8_t a, int16x8_t b, mve_pred16_t p)
-{
-  return vminvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
index a846701312c..d5f7418af38 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int32_t a, int32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
-int32_t
-foo2 (int16_t a, int32x4_t b, mve_pred16_t p)
-{
-  return vminvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
index 716d414f3a7..6a42170fc19 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo (int8_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int8_t a, int8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo1 (int8_t a, int8x16_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
-int8_t
-foo2 (int32_t a, int8x16_t b, mve_pred16_t p)
-{
-  return vminvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
index cc7f8fe8933..8f2f68fef84 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
-foo2 (uint32_t a, uint16x8_t b, mve_pred16_t p)
+foo2 (uint16x8_t b, mve_pred16_t p)
 {
-  return vminvq_p (a, b, p);
+  return vminvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.u16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
index 6bde0be29cc..9d14c39c1dc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo2 (uint8_t a, uint32x4_t b, mve_pred16_t p)
+foo2 (uint32x4_t b, mve_pred16_t p)
 {
-  return vminvq_p (a, b, p);
+  return vminvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.u32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
index bb894904f3c..4c1f4406852 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vminvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
-foo2 (uint16_t a, uint8x16_t b, mve_pred16_t p)
+foo2 (uint8x16_t b, mve_pred16_t p)
 {
-  return vminvq_p (a, b, p);
+  return vminvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminvt.u8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
index 6d589aa4a05..e3242c0aa4d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo (int16_t a, int16x8_t b)
 {
@@ -11,17 +18,16 @@ foo (int16_t a, int16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo1 (int16_t a, int16x8_t b)
 {
   return vminvq (a, b);
 }
 
-int16_t
-foo2 (int8_t a, int16x8_t b)
-{
-  return vminvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
index 7c727d6d92b..1325b38411d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b)
 {
@@ -11,17 +18,16 @@ foo (int32_t a, int32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b)
 {
   return vminvq (a, b);
 }
 
-int32_t
-foo2 (int8_t a, int32x4_t b)
-{
-  return vminvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
index 76309482fc5..81c14a8ac6b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo (int8_t a, int8x16_t b)
 {
@@ -11,17 +18,16 @@ foo (int8_t a, int8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo1 (int8_t a, int8x16_t b)
 {
   return vminvq (a, b);
 }
 
-int8_t
-foo2 (int32_t a, int8x16_t b)
-{
-  return vminvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
index 698975f456c..4372ac62388 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, uint16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (uint16_t a, uint16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, uint16x8_t b)
 {
   return vminvq (a, b);
 }
 
-
-uint8_t
-foo2 (uint32_t a, uint16x8_t b)
+/*
+**foo2:
+**	...
+**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16_t
+foo2 (uint16x8_t b)
 {
-  return vminvq (a, b);
+  return vminvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.u16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
index 7489f81debf..aff3679f49d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b)
 {
@@ -11,17 +18,28 @@ foo (uint32_t a, uint32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b)
 {
   return vminvq (a, b);
 }
 
+/*
+**foo2:
+**	...
+**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo2 (uint16_t a, uint32x4_t b)
+foo2 (uint32x4_t b)
 {
-  return vminvq (a, b);
+  return vminvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.u32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
index aa2b986d558..883e5f2d2c7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, uint8x16_t b)
 {
@@ -11,18 +18,28 @@ foo (uint8_t a, uint8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, uint8x16_t b)
 {
   return vminvq (a, b);
 }
 
-
-uint16_t
-foo2 (uint32_t a, uint8x16_t b)
+/*
+**foo2:
+**	...
+**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8_t
+foo2 (uint8x16_t b)
 {
-  return vminvq (a, b);
+  return vminvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vminv.u8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 09/35] arm: improve tests for vmax*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (7 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 08/35] arm: improve tests for vmin* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:42   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 10/35] arm: improve tests for vabavq* Andrea Corallo
                   ` (26 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vmaxaq_m_s16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxaq_m_s32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxaq_m_s8.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxaq_s16.c           | 16 +++++++-
 .../arm/mve/intrinsics/vmaxaq_s32.c           | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxaq_s8.c | 16 +++++++-
 .../arm/mve/intrinsics/vmaxavq_p_s16.c        | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vmaxavq_p_s32.c        | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vmaxavq_p_s8.c         | 41 ++++++++++++++++---
 .../arm/mve/intrinsics/vmaxavq_s16.c          | 29 ++++++++++---
 .../arm/mve/intrinsics/vmaxavq_s32.c          | 29 ++++++++++---
 .../arm/mve/intrinsics/vmaxavq_s8.c           | 29 ++++++++++---
 .../arm/mve/intrinsics/vmaxnmaq_f16.c         | 16 +++++++-
 .../arm/mve/intrinsics/vmaxnmaq_f32.c         | 16 +++++++-
 .../arm/mve/intrinsics/vmaxnmaq_m_f16.c       | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxnmaq_m_f32.c       | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxnmavq_f16.c        | 27 +++++++++---
 .../arm/mve/intrinsics/vmaxnmavq_f32.c        | 27 +++++++++---
 .../arm/mve/intrinsics/vmaxnmavq_p_f16.c      | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxnmavq_p_f32.c      | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxnmq_f16.c          | 16 +++++++-
 .../arm/mve/intrinsics/vmaxnmq_f32.c          | 16 +++++++-
 .../arm/mve/intrinsics/vmaxnmq_m_f16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxnmq_m_f32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxnmq_x_f16.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxnmq_x_f32.c        | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxnmvq_f16.c         | 27 +++++++++---
 .../arm/mve/intrinsics/vmaxnmvq_f32.c         | 27 +++++++++---
 .../arm/mve/intrinsics/vmaxnmvq_p_f16.c       | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxnmvq_p_f32.c       | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxq_m_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxq_m_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxq_m_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxq_m_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxq_m_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmaxq_m_u8.c           | 26 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vmaxq_s16.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxq_s32.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxq_s8.c  | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u16.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u32.c | 16 +++++++-
 .../gcc.target/arm/mve/intrinsics/vmaxq_u8.c  | 16 +++++++-
 .../arm/mve/intrinsics/vmaxq_x_s16.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxq_x_s32.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxq_x_s8.c           | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxq_x_u16.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxq_x_u32.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxq_x_u8.c           | 25 +++++++++--
 .../arm/mve/intrinsics/vmaxvq_p_s16.c         | 31 ++++++++++----
 .../arm/mve/intrinsics/vmaxvq_p_s32.c         | 31 ++++++++++----
 .../arm/mve/intrinsics/vmaxvq_p_s8.c          | 31 ++++++++++----
 .../arm/mve/intrinsics/vmaxvq_p_u16.c         | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxvq_p_u32.c         | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxvq_p_u8.c          | 39 +++++++++++++++---
 .../arm/mve/intrinsics/vmaxvq_s16.c           | 23 +++++++----
 .../arm/mve/intrinsics/vmaxvq_s32.c           | 23 +++++++----
 .../gcc.target/arm/mve/intrinsics/vmaxvq_s8.c | 23 +++++++----
 .../arm/mve/intrinsics/vmaxvq_u16.c           | 27 +++++++++---
 .../arm/mve/intrinsics/vmaxvq_u32.c           | 27 +++++++++---
 .../gcc.target/arm/mve/intrinsics/vmaxvq_u8.c | 27 +++++++++---
 60 files changed, 1318 insertions(+), 257 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
index 48d213277df..4c487ed7f60 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxaq_m_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxat.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
index 49273819861..5156467f0c1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxaq_m_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
index 5ecdb2c19dc..6564bd88c9b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxaq_m_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxat.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
index f9a9f896aa2..6cabf9f723b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxa.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b)
 {
   return vmaxaq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxa.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b)
 {
   return vmaxaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
index efe2fc16ff7..d0dd3c23600 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxa.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b)
 {
   return vmaxaq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxa.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b)
 {
   return vmaxaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
index 5c2e35f71a6..a7344638dcf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxa.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b)
 {
   return vmaxaq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxa.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b)
 {
   return vmaxaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxa.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
index 74ffad4e726..ac81c8fd1bd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, int16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint16_t a, int16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxavq_p (a, b, p);
 }
 
-
-int16_t
-foo2 (uint8_t a, int16x8_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16_t
+foo2 (int16x8_t b, mve_pred16_t p)
 {
-  return vmaxavq_p (a, b, p);
+  return vmaxavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxavt.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
index 40800b0f12e..119c0c34c76 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint32_t a, int32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxavq_p (a, b, p);
 }
 
-
-int32_t
-foo2 (uint16_t a, int32x4_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b, mve_pred16_t p)
 {
-  return vmaxavq_p (a, b, p);
+  return vmaxavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxavt.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
index 7638737fb84..dfd7f828ef6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint8_t a, int8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxavq_p (a, b, p);
 }
 
-
-int8_t
-foo2 (uint32_t a, int8x16_t b, mve_pred16_t p)
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8_t
+foo2 (int8x16_t b, mve_pred16_t p)
 {
-  return vmaxavq_p (a, b, p);
+  return vmaxavq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxavt.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
index 0dca149b3e8..9f59e8e4542 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, int16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (uint16_t a, int16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, int16x8_t b)
 {
   return vmaxavq (a, b);
 }
 
-
-int16_t
-foo2 (uint8_t a, int16x8_t b)
+/*
+**foo2:
+**	...
+**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16_t
+foo2 (int16x8_t b)
 {
-  return vmaxavq (a, b);
+  return vmaxavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxav.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
index f419a771017..716b8a2a979 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (uint32_t a, int32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b)
 {
   return vmaxavq (a, b);
 }
 
-
-int32_t
-foo2 (uint16_t a, int32x4_t b)
+/*
+**foo2:
+**	...
+**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b)
 {
-  return vmaxavq (a, b);
+  return vmaxavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxav.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
index 214ad88f4aa..0f1a87af54b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, int8x16_t b)
 {
@@ -11,18 +18,28 @@ foo (uint8_t a, int8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, int8x16_t b)
 {
   return vmaxavq (a, b);
 }
 
-
-int8_t
-foo2 (uint32_t a, int8x16_t b)
+/*
+**foo2:
+**	...
+**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8_t
+foo2 (int8x16_t b)
 {
-  return vmaxavq (a, b);
+  return vmaxavq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxav.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
index f19707125db..cd4c813bf3b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vmaxnmaq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnma.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vmaxnmaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnma.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
index 94fc3a2aa28..527466fc131 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vmaxnmaq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnma.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vmaxnmaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnma.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
index b2e82f5464c..39c68cdc172 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmaq_m_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmat.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
index 8fa7344b054..f6f8bf07549 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmaq_m_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmat.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmaq_m (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
index 6d8cf19a341..4c1f20be036 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b)
 {
   return vmaxnmavq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b)
+foo2 (float16x8_t b)
 {
-  return vmaxnmavq (a, b);
+  return vmaxnmavq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmav.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
index ef79030d8eb..86087335cea 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b)
 {
   return vmaxnmavq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b)
+foo2 (float32x4_t b)
 {
-  return vmaxnmavq (a, b);
+  return vmaxnmavq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmav.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
index f7f39f59dad..a4973567d5e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmavq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
+foo2 (float16x8_t b, mve_pred16_t p)
 {
-  return vmaxnmavq_p (a, b, p);
+  return vmaxnmavq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmavt.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
index 341f6254a5a..b229cb3a322 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmavq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
+foo2 (float32x4_t b, mve_pred16_t p)
 {
-  return vmaxnmavq_p (a, b, p);
+  return vmaxnmavq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmavt.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
index 59a8070e07b..faf968ebb21 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vmaxnmq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnm.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vmaxnmq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnm.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
index 5db42bd4b8c..f7ee01b1f14 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vmaxnmq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnm.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmaxnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vmaxnmq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmaxnm.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
index 4668fd03c9d..ee3444393ed 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
index 9e8ccbc84b7..5d434432856 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
index ecca6069d22..dad76734fd8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
index c3965dda4f1..2fe8c0d4f3d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
index 80bd1d4cda1..9787cc1ba90 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b)
 {
   return vmaxnmvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b)
+foo2 (float16x8_t b)
 {
-  return vmaxnmvq (a, b);
+  return vmaxnmvq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmv.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
index bb2fc46f88a..b1191876850 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b)
 {
   return vmaxnmvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b)
+foo2 (float32x4_t b)
 {
-  return vmaxnmvq (a, b);
+  return vmaxnmvq (1.1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmv.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
index 3efe203007b..0b1740d5ed2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo (float16_t a, float16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
 foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmaxnmvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float16_t
-foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
+foo2 (float16x8_t b, mve_pred16_t p)
 {
-  return vmaxnmvq_p (a, b, p);
+  return vmaxnmvq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmvt.f16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
index 6c13247f1f1..ca6ad91d24d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo (float32_t a, float32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
 foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmaxnmvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 float32_t
-foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
+foo2 (float32x4_t b, mve_pred16_t p)
 {
-  return vmaxnmvq_p (a, b, p);
+  return vmaxnmvq_p (1.1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxnmvt.f32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
index 2791ed4c562..548824fc58a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
index 27f7d5d7b16..e935729b47d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
index 23b7569f720..8028fa031c7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
index 61e51e3b830..e872f9e72f8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmaxq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
index 23df7eeaed6..76606555881 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmaxq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
index 138d5c87894..7ade467cafd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmaxq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmaxq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
index a42fc82a852..bf547a2420d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vmaxq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
index 14c094a5d11..25bb950c0bf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vmaxq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
index 0540a27bae9..33057f1a58e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vmaxq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
index 6b9b5a73bcd..7717a9a5057 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vmaxq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
index 3112302bf1a..36b5c276cfe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vmaxq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
index b1baa5083bd..e643e5f3e3c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmax.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vmaxq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmax.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vmaxq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmax.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
index 9d92f2ccd85..a32feb0d7cd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
index 200fd4b1bb1..3ac1994c4f8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
index 2fe752558b9..c9ba33d1504 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
index 967622e331c..954a9e2f02a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmaxq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
index 56b5d8fa8b8..022d418af84 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmaxq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
index 1816f959dd7..7e1687a8b72 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmaxq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmaxq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
index 657efc51bea..a97703eb58c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo (int16_t a, int16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int16_t a, int16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo1 (int16_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
-int16_t
-foo2 (int8_t a, int16x8_t b, mve_pred16_t p)
-{
-  return vmaxvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
index 5882351c0fa..b4bddcb8312 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int32_t a, int32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
-int32_t
-foo2 (int16_t a, int32x4_t b, mve_pred16_t p)
-{
-  return vmaxvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
index 3737ecd3307..ee8c3e9155f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo (int8_t a, int8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,20 @@ foo (int8_t a, int8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo1 (int8_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
-int8_t
-foo2 (int32_t a, int8x16_t b, mve_pred16_t p)
-{
-  return vmaxvq_p (a, b, p);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
index 348cf39caa0..906adf85936 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
-foo2 (uint32_t a, uint16x8_t b, mve_pred16_t p)
+foo2 (uint16x8_t b, mve_pred16_t p)
 {
-  return vmaxvq_p (a, b, p);
+  return vmaxvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.u16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
index f2e976216c5..acc5367c5a2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo2 (uint8_t a, uint32x4_t b, mve_pred16_t p)
+foo2 (uint32x4_t b, mve_pred16_t p)
 {
-  return vmaxvq_p (a, b, p);
+  return vmaxvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.u32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
index 7df5b63c9bc..358cb40f829 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
@@ -1,9 +1,20 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
 {
@@ -11,18 +22,36 @@ foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmaxvq_p (a, b, p);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
-foo2 (uint16_t a, uint8x16_t b, mve_pred16_t p)
+foo2 (uint8x16_t b, mve_pred16_t p)
 {
-  return vmaxvq_p (a, b, p);
+  return vmaxvq_p (1, b, p);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxvt.u8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
index 8412452cf33..485355a7d72 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo (int16_t a, int16x8_t b)
 {
@@ -11,18 +18,16 @@ foo (int16_t a, int16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int16_t
 foo1 (int16_t a, int16x8_t b)
 {
   return vmaxvq (a, b);
 }
 
-
-int16_t
-foo2 (int8_t a, int16x8_t b)
-{
-  return vmaxvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.s16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
index 09f4909c9a8..3b9075689a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b)
 {
@@ -11,18 +18,16 @@ foo (int32_t a, int32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b)
 {
   return vmaxvq (a, b);
 }
 
-
-int32_t
-foo2 (int16_t a, int32x4_t b)
-{
-  return vmaxvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.s32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
index a087bbc6b64..f13a0168d9d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo (int8_t a, int8x16_t b)
 {
@@ -11,18 +18,16 @@ foo (int8_t a, int8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int8_t
 foo1 (int8_t a, int8x16_t b)
 {
   return vmaxvq (a, b);
 }
 
-
-int8_t
-foo2 (int32_t a, int8x16_t b)
-{
-  return vmaxvq (a, b);
-}
-
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.s8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
index 47fe0d1cf0f..6a0fe254043 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo (uint16_t a, uint16x8_t b)
 {
@@ -11,18 +18,28 @@ foo (uint16_t a, uint16x8_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
 foo1 (uint16_t a, uint16x8_t b)
 {
   return vmaxvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16_t
-foo2 (uint32_t a, uint16x8_t b)
+foo2 (uint16x8_t b)
 {
-  return vmaxvq (a, b);
+  return vmaxvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.u16" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
index aa723daf5dd..eed20046e53 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b)
 {
@@ -11,18 +18,28 @@ foo (uint32_t a, uint32x4_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b)
 {
   return vmaxvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo2 (uint8_t a, uint32x4_t b)
+foo2 (uint32x4_t b)
 {
-  return vmaxvq (a, b);
+  return vmaxvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.u32" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
index 3aae785040c..d44a6d3bb02 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
@@ -1,9 +1,16 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo (uint8_t a, uint8x16_t b)
 {
@@ -11,18 +18,28 @@ foo (uint8_t a, uint8x16_t b)
 }
 
 
+/*
+**foo1:
+**	...
+**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
 foo1 (uint8_t a, uint8x16_t b)
 {
   return vmaxvq (a, b);
 }
 
-
+/*
+**foo2:
+**	...
+**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8_t
-foo2 (uint16_t a, uint8x16_t b)
+foo2 (uint8x16_t b)
 {
-  return vmaxvq (a, b);
+  return vmaxvq (1, b);
 }
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
-/* { dg-final { scan-assembler-times "vmaxv.u8" 3 } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 10/35] arm: improve tests for vabavq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (8 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 09/35] arm: improve tests for vmax* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:43   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 11/35] arm: improve tests for vabdq* Andrea Corallo
                   ` (25 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
	* gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
	* gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
	* gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
	* gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
	* gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
	* gcc.target/arm/mve/intrinsics/vabavq_s16.c:
	* gcc.target/arm/mve/intrinsics/vabavq_s32.c:
	* gcc.target/arm/mve/intrinsics/vabavq_s8.c:
	* gcc.target/arm/mve/intrinsics/vabavq_u16.c:
	* gcc.target/arm/mve/intrinsics/vabavq_u32.c:
	* gcc.target/arm/mve/intrinsics/vabavq_u8.c:
---
 .../arm/mve/intrinsics/vabavq_p_s16.c         | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_p_s32.c         | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_p_s8.c          | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_p_u16.c         | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_p_u32.c         | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_p_u8.c          | 40 ++++++++++++++++++-
 .../arm/mve/intrinsics/vabavq_s16.c           | 28 ++++++++++++-
 .../arm/mve/intrinsics/vabavq_s32.c           | 28 ++++++++++++-
 .../gcc.target/arm/mve/intrinsics/vabavq_s8.c | 28 ++++++++++++-
 .../arm/mve/intrinsics/vabavq_u16.c           | 28 ++++++++++++-
 .../arm/mve/intrinsics/vabavq_u32.c           | 28 ++++++++++++-
 .../gcc.target/arm/mve/intrinsics/vabavq_u8.c | 28 ++++++++++++-
 12 files changed, 384 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
index 78ac801fa3c..843d022c418 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
 {
   return vabavq_p_s16 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int16x8_t b, int16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
index af4e30b6127..6ed9b9ac1c4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
 {
   return vabavq_p_s32 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b, int32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
index a76b6bd4bda..ec34be92a28 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
 {
   return vabavq_p_s8 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.s8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int8x16_t b, int8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
index 9627a00b812..440b603a18e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
 {
   return vabavq_p_u16 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint16x8_t b, uint16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
index 298c2c38101..9500ee054b1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
 {
   return vabavq_p_u32 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
index 775072225f8..40c9a51fbe4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
 {
   return vabavq_p_u8 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
 {
   return vabavq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vabavt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint8x16_t b, uint8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
index c2383f1865b..27684fa4a88 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int16x8_t b, int16x8_t c)
 {
   return vabavq_s16 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int16x8_t b, int16x8_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s16"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int16x8_t b, int16x8_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
index 7170d013c3b..f595609a2a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int32x4_t b, int32x4_t c)
 {
   return vabavq_s32 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int32x4_t b, int32x4_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s32"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int32x4_t b, int32x4_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
index d75ecdbdbdf..60fa9e23b7b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, int8x16_t b, int8x16_t c)
 {
   return vabavq_s8 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, int8x16_t b, int8x16_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.s8"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (int8x16_t b, int8x16_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
index 40ab94d9083..f3255276eda 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint16x8_t b, uint16x8_t c)
 {
   return vabavq_u16 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint16x8_t b, uint16x8_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint16x8_t b, uint16x8_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
index 4b9f5c32f3d..f41fa1f3952 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b, uint32x4_t c)
 {
   return vabavq_u32 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b, uint32x4_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint32x4_t b, uint32x4_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
index 3638e9d7106..3a2654435df 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint8x16_t b, uint8x16_t c)
 {
   return vabavq_u8 (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint8x16_t b, uint8x16_t c)
 {
   return vabavq (a, b, c);
 }
 
-/* { dg-final { scan-assembler "vabav.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint8x16_t b, uint8x16_t c)
+{
+  return vabavq (1, b, c);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 11/35] arm: improve tests for vabdq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (9 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 10/35] arm: improve tests for vabavq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:44   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 12/35] arm: improve tests and fix vabsq* Andrea Corallo
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vabdq_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vabdq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u8.c: Likewise.
---
 .../gcc.target/arm/mve/intrinsics/vabdq_f16.c | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_f32.c | 16 ++++++++++--
 .../arm/mve/intrinsics/vabdq_m_f16.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_f32.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_s16.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_s32.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_s8.c           | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_u16.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_u32.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_m_u8.c           | 26 ++++++++++++++++---
 .../gcc.target/arm/mve/intrinsics/vabdq_s16.c | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_s32.c | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_s8.c  | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_u16.c | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_u32.c | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vabdq_u8.c  | 16 ++++++++++--
 .../arm/mve/intrinsics/vabdq_x_f16.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_f32.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_s16.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_s32.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_s8.c           | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_u16.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_u32.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vabdq_x_u8.c           | 25 +++++++++++++++---
 24 files changed, 464 insertions(+), 73 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
index b55e826e4b6..f379b25c49e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vabdq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
index f1a95b14e03..3ba808e0b4d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vabdq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
index f92e671edec..903c6dfe861 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vabdq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
index 5e30997c997..4ddf4ee5c61 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vabdq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
index 35809895dea..c719a0b5e9c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vabdq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
index 77d97e1db63..048554144cd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vabdq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
index a0004d9f290..458b920b5cb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vabdq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
index c4dc9a469da..8e163edb153 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vabdq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
index 18a64d3a19d..619d4706dc5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vabdq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
index 494f39cb857..079478df08a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vabdq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vabdq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
index 73773ac9ebc..0dce4c482ac 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vabdq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
index 3c552a2969e..f5908fe81d8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vabdq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
index f7de6f707ac..3f249e1a622 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vabdq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
index 90d1c873cca..16a4b930d2c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vabdq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
index 405dca51466..2b5ee12945c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vabdq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
index 2b693c16520..50a4c162c9b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vabdq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vabdq (a, b);
 }
 
-/* { dg-final { scan-assembler "vabd.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
index 9d771a3325f..da142f4394b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vabdq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
index 498851348d5..1ff1bef258f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vabdq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
index 1fa77cc5cae..6733e2bcc14 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vabdq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
index 24a62702482..8d7631b9ac6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vabdq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
index f96c2dfd147..90784c1d389 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vabdq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
index 820b8416330..f376374564a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vabdq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
index 2d81930348a..d9467a1ccd7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vabdq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
index 7f956850b52..1ea3713d12b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vabdq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabdt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vabdq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 12/35] arm: improve tests and fix vabsq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (10 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 11/35] arm: improve tests for vabdq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:45   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic Andrea Corallo
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vabsq_f<mode>): Fix spacing.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vabsq_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s8.c: Likewise.
---
 gcc/config/arm/mve.md                         |  2 +-
 .../gcc.target/arm/mve/intrinsics/vabsq_f16.c | 22 +++++++++++++++-
 .../gcc.target/arm/mve/intrinsics/vabsq_f32.c | 22 +++++++++++++++-
 .../arm/mve/intrinsics/vabsq_m_f16.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_m_f32.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_m_s16.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_m_s32.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_m_s8.c           | 25 ++++++++++++++++---
 .../gcc.target/arm/mve/intrinsics/vabsq_s16.c | 20 ++++++++++++---
 .../gcc.target/arm/mve/intrinsics/vabsq_s32.c | 20 ++++++++++++---
 .../gcc.target/arm/mve/intrinsics/vabsq_s8.c  | 16 ++++++++++--
 .../arm/mve/intrinsics/vabsq_x_f16.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_x_f32.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_x_s16.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_x_s32.c          | 25 ++++++++++++++++---
 .../arm/mve/intrinsics/vabsq_x_s8.c           | 25 ++++++++++++++++---
 16 files changed, 309 insertions(+), 43 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3330a220aea..bc4e2f2ac21 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -279,7 +279,7 @@ (define_insn "mve_vabsq_f<mode>"
 	(abs:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vabs.f%#<V_sz_elem>  %q0, %q1"
+  "vabs.f%#<V_sz_elem>\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
index 08e141baedc..f29ada8c058 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
@@ -1,13 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabs.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a)
 {
   return vabsq_f16 (a);
 }
 
-/* { dg-final { scan-assembler "vabs.f16"  }  } */
+
+/*
+**foo1:
+**	...
+**	vabs.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (float16x8_t a)
+{
+  return vabsq (a);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
index 3614a44fbdc..cc24744fb26 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
@@ -1,13 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabs.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a)
 {
   return vabsq_f32 (a);
 }
 
-/* { dg-final { scan-assembler "vabs.f32"  }  } */
+
+/*
+**foo1:
+**	...
+**	vabs.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (float32x4_t a)
+{
+  return vabsq (a);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
index 30c14a151af..21cf284d045 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
 {
   return vabsq_m_f16 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
 {
   return vabsq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
index 652056aa98c..236830b3a9e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
 {
   return vabsq_m_f32 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
 {
   return vabsq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
index 2dcf488bd0d..22f7b37b30b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, mve_pred16_t p)
 {
   return vabsq_m_s16 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, mve_pred16_t p)
 {
   return vabsq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
index 183909fef93..b3021edf52b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, mve_pred16_t p)
 {
   return vabsq_m_s32 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, mve_pred16_t p)
 {
   return vabsq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
index cd17974838e..da9ff2f978a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, mve_pred16_t p)
 {
   return vabsq_m_s8 (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, mve_pred16_t p)
 {
   return vabsq_m (inactive, a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
index 243afebc38c..84906302c8a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabs.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a)
 {
   return vabsq_s16 (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabs.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a)
 {
   return vabsq (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
index d9843503a48..117c787d595 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabs.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a)
 {
   return vabsq_s32 (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabs.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a)
 {
   return vabsq (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
index 93bf1520dd3..a7f1413505c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vabs.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a)
 {
   return vabsq_s8 (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vabs.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a)
 {
   return vabsq (a);
 }
 
-/* { dg-final { scan-assembler "vabs.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
index d1fc7002ccb..f24a8cccb53 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, mve_pred16_t p)
 {
   return vabsq_x_f16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, mve_pred16_t p)
 {
   return vabsq_x (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
index 0beccac030d..fd4c2277969 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, mve_pred16_t p)
 {
   return vabsq_x_f32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, mve_pred16_t p)
 {
   return vabsq_x (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
index fd67fd5ccac..0e1d1bb94d4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, mve_pred16_t p)
 {
   return vabsq_x_s16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, mve_pred16_t p)
 {
   return vabsq_x (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
index 22d561d1e46..64d0e4b574d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, mve_pred16_t p)
 {
   return vabsq_x_s32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, mve_pred16_t p)
 {
   return vabsq_x (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
index 6908a6ca20c..742bc701fae 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, mve_pred16_t p)
 {
   return vabsq_x_s8 (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vabst.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, mve_pred16_t p)
 {
   return vabsq_x (a, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (11 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 12/35] arm: improve tests and fix vabsq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:49   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters Andrea Corallo
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Stam Markianos-Wright

From: Stam Markianos-Wright <stam.markianos-wright@arm.com>

It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
resolution would fall back to the `__ARM_undef` failure state.

This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79` and
`6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
identified, because the tests were not checking for this kind of failure.

The above commits changed the definitions of the intrinsics from using
`[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
allowed `int` to be supported in user code through the overloaded
`#defines`, but seems to have broken the `[u]int[8/16/32]_t` types

The solution implemented by this patch is to explicitly use a new
_Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
user code and are handled by the overloading mechanism correctly.

gcc/ChangeLog:

        * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
        (__arm_vaddq_m_n_s32): Likewise.
        (__arm_vaddq_m_n_s16): Likewise.
        (__arm_vaddq_m_n_u8): Likewise.
        (__arm_vaddq_m_n_u32): Likewise.
        (__arm_vaddq_m_n_u16): Likewise.
        (__arm_vaddq_m): Fix Overloading.
        (__ARM_mve_coerce3): New.
---
 gcc/config/arm/arm_mve.h | 78 ++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 684f997520f..951dc25374b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
 }
@@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_s32 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_s16 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_u8 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_u32 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
 {
  return __arm_vaddq_m_n_u16 (__inactive, __a, __b, __p);
 }
@@ -35657,6 +35657,8 @@ extern void *__ARM_undef;
     _Generic(param, type: param, const type: param, default: *(type *)__ARM_undef)
 #define __ARM_mve_coerce2(param, type) \
     _Generic(param, type: param, float16_t: param, float32_t: param, default: *(type *)__ARM_undef)
+#define __ARM_mve_coerce3(param, type) \
+    _Generic(param, type: param, int8_t: param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)
 
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
@@ -35871,14 +35873,14 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t), __ARM_mve_coerce(p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t), __ARM_mve_coerce(p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
 
@@ -37316,12 +37318,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
 
@@ -38820,12 +38822,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int)));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39641,12 +39643,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (12 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:51   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] Andrea Corallo
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Stam Markianos-Wright

From: Stam Markianos-Wright <stam.markianos-wright@arm.com>

This is a mechanical patch that propagates the change proposed in
my previous patch for vaddq[_m]_n
across all other polymorphic MVE intrinsic overloads of scalar types.

The find and Replace patterns used were:

s/__ARM_mve_coerce\(__p(\d+), [u]?int(8|16|32|64)_t\)
/__ARM_mve_coerce3(p$1, int)/g

s/__ARM_mve_coerce2\(__p(\d+), double\)
/__ARM_mve_coerce2(p$1, double)/g

gcc/ChangeLog:

        * config/arm/arm_mve.h (__arm_vaddq): Fix Overloading.
        (__arm_vmulq): Likewise.
        (__arm_vcmpeqq): Likewise.
        (__arm_vcmpneq): Likewise.
        (__arm_vmaxnmavq): Likewise.
        (__arm_vmaxnmvq): Likewise.
        (__arm_vminnmavq): Likewise.
        (__arm_vsubq): Likewise.
        (__arm_vminnmvq): Likewise.
        (__arm_vrshlq): Likewise.
        (__arm_vqsubq): Likewise.
        (__arm_vqdmulltq): Likewise.
        (__arm_vqdmullbq): Likewise.
        (__arm_vqdmulhq): Likewise.
        (__arm_vqaddq): Likewise.
        (__arm_vhaddq): Likewise.
        (__arm_vhsubq): Likewise.
        (__arm_vqdmlashq): Likewise.
        (__arm_vqrdmlahq): Likewise.
        (__arm_vmlasq): Likewise.
        (__arm_vqdmlahq): Likewise.
        (__arm_vmaxnmavq_p): Likewise.
        (__arm_vmaxnmvq_p): Likewise.
        (__arm_vminnmavq_p): Likewise.
        (__arm_vminnmvq_p): Likewise.
        (__arm_vfmasq_m): Likewise.
        (__arm_vsetq_lane): Likewise.
        (__arm_vcmpneq_m): Likewise.
        (__arm_vhaddq_x): Likewise.
        (__arm_vhsubq_x): Likewise.
        (__arm_vqrdmlashq_m): Likewise.
        (__arm_vqdmlashq_m): Likewise.
        (__arm_vmlaldavaxq_p): Likewise.
        (__arm_vmlasq_m): Likewise.
        (__arm_vqdmulhq_m): Likewise.
        (__arm_vqdmulltq_m): Likewise.
        (__arm_viwdupq_m): Likewise.
        (__arm_viwdupq_u16): Likewise.
        (__arm_viwdupq_u32): Likewise.
        (__arm_viwdupq_u8): Likewise.
        (__arm_vdwdupq_m): Likewise.
        (__arm_vdwdupq_u16): Likewise.
        (__arm_vdwdupq_u32): Likewise.
        (__arm_vdwdupq_u8): Likewise.
        (__arm_vaddlvaq): Likewise.
        (__arm_vaddlvaq_p): Likewise.
        (__arm_vaddvaq): Likewise.
        (__arm_vaddvaq_p): Likewise.
        (__arm_vcmphiq_m): Likewise.
        (__arm_vmladavaq_p): Likewise.
        (__arm_vmladavaxq): Likewise.
        (__arm_vmlaldavaxq): Likewise.
        (__arm_vrmlaldavhaq_p): Likewise.
---
 gcc/config/arm/arm_mve.h | 1106 +++++++++++++++++++-------------------
 1 file changed, 553 insertions(+), 553 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 951dc25374b..fd1876b57a0 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35881,8 +35881,8 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
 
 #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -35927,14 +35927,14 @@ extern void *__ARM_undef;
 #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -35959,14 +35959,14 @@ extern void *__ARM_undef;
 #define __arm_vcmpeqq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpeqq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpeqq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpeqq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -35997,16 +35997,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpeqq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpeqq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2));})
 
 #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36014,13 +36014,13 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpgtq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpgtq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
 
 #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36030,11 +36030,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpleq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpleq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
 
 #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36042,25 +36042,25 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpltq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpltq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
 
 #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36115,8 +36115,8 @@ extern void *__ARM_undef;
 #define __arm_vmaxnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
 
 #define __arm_vmaxnmq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36127,14 +36127,14 @@ extern void *__ARM_undef;
 #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
 
 #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
 
 #define __arm_vminnmaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36145,8 +36145,8 @@ extern void *__ARM_undef;
 #define __arm_vminnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmavq_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmavq_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmavq_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmavq_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
 
 #define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -36168,14 +36168,14 @@ extern void *__ARM_undef;
 #define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36188,8 +36188,8 @@ extern void *__ARM_undef;
 #define __arm_vminnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmvq_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmvq_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmvq_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmvq_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t)));})
 
 #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -36244,12 +36244,12 @@ extern void *__ARM_undef;
 #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36280,12 +36280,12 @@ extern void *__ARM_undef;
 #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36336,12 +36336,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36349,9 +36349,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vmlaldavxq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36384,8 +36384,8 @@ extern void *__ARM_undef;
 #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
@@ -36398,17 +36398,17 @@ extern void *__ARM_undef;
 #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
 #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
@@ -36416,12 +36416,12 @@ extern void *__ARM_undef;
 #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36454,12 +36454,12 @@ extern void *__ARM_undef;
 #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36484,12 +36484,12 @@ extern void *__ARM_undef;
 #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -36632,12 +36632,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vsriq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36716,44 +36716,44 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-	    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+	    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-	    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+	    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36943,11 +36943,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpgtq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpgtq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
@@ -36959,11 +36959,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpleq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpleq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2));})
 
 #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36973,11 +36973,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpltq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpltq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2));})
 
 #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -36990,14 +36990,14 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpneq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpneq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2));})
 
 #define __arm_vcvtbq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37051,8 +37051,8 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vfmaq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vfmaq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
 
@@ -37067,8 +37067,8 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double)));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double)));})
 
 #define __arm_vmaxnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37091,14 +37091,14 @@ extern void *__ARM_undef;
 #define __arm_vmaxnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
 #define __arm_vmaxnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
 #define __arm_vminnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37109,14 +37109,14 @@ extern void *__ARM_undef;
 #define __arm_vminnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
 #define __arm_vminnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
 #define __arm_vrndnq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37178,13 +37178,13 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpgeq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpgeq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double)));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
 
 #define __arm_vrshrnbq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37285,11 +37285,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(__p1, double), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(__p1, double), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double), p2), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpgeq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpgeq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
@@ -37324,8 +37324,8 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37466,15 +37466,15 @@ extern void *__ARM_undef;
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vfmaq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vfmaq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vfmasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vfmsq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37509,14 +37509,14 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vmulq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vmulq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -37543,14 +37543,14 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38023,19 +38023,19 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -38158,19 +38158,19 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vmulq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vmulq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
@@ -38258,8 +38258,8 @@ extern void *__ARM_undef;
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(__p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(__p2, double), p3));})
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
 
 #define __arm_vcmulq_rot90_x(p1,p2,p3)  ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -38283,16 +38283,16 @@ extern void *__ARM_undef;
 #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
-  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce2(__p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint64x2_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce2(p0, double), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
 #else /* MVE Integer.  */
 
@@ -38410,12 +38410,12 @@ extern void *__ARM_undef;
 #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38442,12 +38442,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -38461,12 +38461,12 @@ extern void *__ARM_undef;
 #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38497,12 +38497,12 @@ extern void *__ARM_undef;
 #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38571,12 +38571,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38584,16 +38584,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
@@ -38601,12 +38601,12 @@ extern void *__ARM_undef;
 #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38637,12 +38637,12 @@ extern void *__ARM_undef;
 #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38717,12 +38717,12 @@ extern void *__ARM_undef;
 #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38747,12 +38747,12 @@ extern void *__ARM_undef;
 #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38858,12 +38858,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vqmovntq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38944,16 +38944,16 @@ extern void *__ARM_undef;
 #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
 #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
@@ -38963,9 +38963,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38973,9 +38973,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38983,9 +38983,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpleq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpleq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -38993,20 +38993,20 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
@@ -39031,12 +39031,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vbicq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -39146,25 +39146,25 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39227,9 +39227,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 
 #define __arm_vcmpgtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
@@ -39238,9 +39238,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vcmpleq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39248,9 +39248,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpleq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpleq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39258,9 +39258,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpltq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpltq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39271,12 +39271,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpneq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vdupq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39299,23 +39299,23 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vnegq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39340,9 +39340,9 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)));})
 
 #define __arm_vqdmlsdhq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39505,12 +39505,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -39610,12 +39610,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -39660,12 +39660,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -40002,15 +40002,15 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -40104,15 +40104,15 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
@@ -40234,14 +40234,14 @@ extern void *__ARM_undef;
 #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
 
 #endif /* MVE Integer.  */
 
@@ -40421,12 +40421,12 @@ extern void *__ARM_undef;
 #define __arm_vhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -40451,12 +40451,12 @@ extern void *__ARM_undef;
 #define __arm_vhsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -40576,25 +40576,25 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqrdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqrshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -40695,12 +40695,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -40715,9 +40715,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqrdmulhq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqrdmulhq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqrdmulhq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqrdmlsdhxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -40843,17 +40843,17 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
 #define __arm_vmlaldavaxq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
 #define __arm_vmlsldavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -40992,12 +40992,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -41031,12 +41031,12 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vmaxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41064,23 +41064,23 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vmlasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41126,12 +41126,12 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
@@ -41143,17 +41143,17 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
@@ -41164,15 +41164,15 @@ extern void *__ARM_undef;
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmullbq_m_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmullbq_m_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3));})
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
 #define __arm_vqdmulltq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t), p3), \
-  int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulltq_m_s16 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulltq_m_s32 (__ARM_mve_coerce(__p0, int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
@@ -41238,9 +41238,9 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
 #define __arm_vmullbq_poly_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41311,51 +41311,51 @@ extern void *__ARM_undef;
 #define __arm_viwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
 
 #define __arm_viwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16 (__ARM_mve_coerce(__p0, uint32_t), p1, (const int) p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16 (__ARM_mve_coerce3(p0, int), p1, (const int) p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u16 (__ARM_mve_coerce(__p0, uint32_t *), p1, (const int) p2));})
 
 #define __arm_viwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32 (__ARM_mve_coerce3(p0, int), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u32 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
 #define __arm_viwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8 (__ARM_mve_coerce3(p0, int), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u8 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
 #define __arm_vdwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
 
 #define __arm_vdwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16 (__ARM_mve_coerce3(p0, int), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u16 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
 #define __arm_vdwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32 (__ARM_mve_coerce3(p0, int), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u32 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
 #define __arm_vdwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
+  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8 (__ARM_mve_coerce3(p0, int), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u8 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
 #define __arm_vshlcq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
@@ -41392,14 +41392,14 @@ extern void *__ARM_undef;
 #define __arm_vaddlvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddlvaq_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddlvaq_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddlvaq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddlvaq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
 #define __arm_vaddlvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddlvaq_p_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddlvaq_p_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddlvaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddlvaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
 #define __arm_vaddlvq(p0) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -41414,22 +41414,22 @@ extern void *__ARM_undef;
 #define __arm_vaddvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vaddvaq_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vaddvaq_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddvaq_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vaddvaq_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vaddvaq_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddvaq_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vaddvaq_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vaddvaq_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddvaq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vaddvaq_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vaddvaq_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddvaq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
 #define __arm_vaddvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vaddvaq_p_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vaddvaq_p_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddvaq_p_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vaddvaq_p_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vaddvaq_p_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddvaq_p_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]: __arm_vaddvaq_p_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]: __arm_vaddvaq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]: __arm_vaddvaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]: __arm_vaddvaq_p_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]: __arm_vaddvaq_p_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]: __arm_vaddvaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
 #define __arm_vaddvq(p0) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
@@ -41455,9 +41455,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpcsq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpcsq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpcsq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmpcsq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41465,9 +41465,9 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpcsq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpcsq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpcsq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2));})
 
 #define __arm_vcmphiq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41475,16 +41475,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmphiq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmphiq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmphiq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)));})
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vcmphiq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int), p2), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmphiq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmphiq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmphiq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
@@ -41581,34 +41581,34 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
 
 #define __arm_vmladavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
 #define __arm_vmladavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
 
 #define __arm_vmladavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41651,17 +41651,17 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
 
 #define __arm_vmlaldavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)));})
 
 #define __arm_vmlaldavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -41856,15 +41856,15 @@ extern void *__ARM_undef;
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
 
 #define __arm_vrmlaldavhaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce3(p0, int), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
 #define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (13 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-18 16:58   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic Andrea Corallo
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Stam Markianos-Wright

From: Stam Markianos-Wright <stam.markianos-wright@arm.com>

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
        PR 107515
        * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
---
 gcc/config/arm/arm_mve.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
 	short: __ARM_mve_type_int_n, \
 	int: __ARM_mve_type_int_n, \
 	long: __ARM_mve_type_int_n, \
+	_Float16: __ARM_mve_type_fp_n, \
+	__fp16: __ARM_mve_type_fp_n, \
+	float: __ARM_mve_type_fp_n, \
 	double: __ARM_mve_type_fp_n, \
 	long long: __ARM_mve_type_int_n, \
 	unsigned char: __ARM_mve_type_int_n, \
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (14 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 10:00   ` Christophe Lyon
  2022-11-22 16:48   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 17/35] arm: improve tests and fix vadd* Andrea Corallo
                   ` (19 subsequent siblings)
  35 siblings, 2 replies; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Stam Markianos-Wright

From: Stam Markianos-Wright <stam.markianos-wright@arm.com>

In the past we had only defined the vsubq_x generic overload of the
vsubq_x_* intrinsics for float vector types.  This would cause them
to fall back to the `__ARM_undef` failure state if they was called
through the generic version.
This patch simply adds these overloads.

gcc/ChangeLog:

        * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
         (__arm_vsubq_x Integer): New.
---
 gcc/config/arm/arm_mve.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f6b42dc3fab..09167ec118e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
 #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
@@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce1(p0, uint16_t *)), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce1(p0, uint32_t *))))
 
+#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
+
 #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 17/35] arm: improve tests and fix vadd*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (15 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:49   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 18/35] arm: improve tests for vmulq* Andrea Corallo
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vaddlvq_p_<supf>v4si)
	(mve_vaddq_n_<supf><mode>, mve_vaddvaq_<supf><mode>)
	(mve_vaddlvaq_<supf>v4si, mve_vaddq_n_f<mode>)
	(mve_vaddlvaq_p_<supf>v4si, mve_vaddq<mode>, mve_vaddq_f<mode>):
	Fix spacing.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u8.c: Likewise.
---
 gcc/config/arm/mve.md                         | 18 ++++----
 .../arm/mve/intrinsics/vaddlvaq_p_s32.c       | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddlvaq_p_u32.c       | 40 +++++++++++++++++-
 .../arm/mve/intrinsics/vaddlvaq_s32.c         | 16 ++++++-
 .../arm/mve/intrinsics/vaddlvaq_u32.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddlvq_p_s32.c        | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddlvq_p_u32.c        | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddlvq_s32.c          | 22 +++++++---
 .../arm/mve/intrinsics/vaddlvq_u32.c          | 20 +++++++--
 .../gcc.target/arm/mve/intrinsics/vaddq_f16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_f32.c | 16 ++++++-
 .../arm/mve/intrinsics/vaddq_m_f16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_f32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_f16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_f32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_s16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_s32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_s8.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_u16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_u32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_m_n_u8.c         | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_m_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_m_u8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_n_f16.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddq_n_f32.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddq_n_s16.c          | 16 ++++++-
 .../arm/mve/intrinsics/vaddq_n_s32.c          | 16 ++++++-
 .../arm/mve/intrinsics/vaddq_n_s8.c           | 16 ++++++-
 .../arm/mve/intrinsics/vaddq_n_u16.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddq_n_u32.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddq_n_u8.c           | 28 ++++++++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_s16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_s32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_s8.c  | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_u16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_u32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vaddq_u8.c  | 16 ++++++-
 .../arm/mve/intrinsics/vaddq_x_f16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_f32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_f16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_f32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_s16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_s32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_s8.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_u16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_u32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_x_n_u8.c         | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vaddq_x_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddq_x_u8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vaddvaq_p_s16.c        | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvaq_p_s32.c        | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvaq_p_s8.c         | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvaq_p_u16.c        | 40 +++++++++++++++++-
 .../arm/mve/intrinsics/vaddvaq_p_u32.c        | 40 +++++++++++++++++-
 .../arm/mve/intrinsics/vaddvaq_p_u8.c         | 40 +++++++++++++++++-
 .../arm/mve/intrinsics/vaddvaq_s16.c          | 16 ++++++-
 .../arm/mve/intrinsics/vaddvaq_s32.c          | 16 ++++++-
 .../arm/mve/intrinsics/vaddvaq_s8.c           | 16 ++++++-
 .../arm/mve/intrinsics/vaddvaq_u16.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddvaq_u32.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddvaq_u8.c           | 28 ++++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_s16.c         | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_s32.c         | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_s8.c          | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_u16.c         | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_u32.c         | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_p_u8.c          | 24 ++++++++++-
 .../arm/mve/intrinsics/vaddvq_s16.c           | 22 +++++++---
 .../arm/mve/intrinsics/vaddvq_s32.c           | 22 +++++++---
 .../gcc.target/arm/mve/intrinsics/vaddvq_s8.c | 20 +++++++--
 .../arm/mve/intrinsics/vaddvq_u16.c           | 20 +++++++--
 .../arm/mve/intrinsics/vaddvq_u32.c           | 20 +++++++--
 .../gcc.target/arm/mve/intrinsics/vaddvq_u8.c | 20 +++++++--
 81 files changed, 1864 insertions(+), 252 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index bc4e2f2ac21..5ce2a289225 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -636,7 +636,7 @@ (define_insn "mve_vaddlvq_<supf>v4si"
 	 VADDLVQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddlv.<supf>32 %Q0, %R0, %q1"
+  "vaddlv.<supf>32\t%Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -817,7 +817,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
 	 VADDLVQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddlvt.<supf>32 %Q0, %R0, %q1"
+  "vpst\;vaddlvt.<supf>32\t%Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -879,7 +879,7 @@ (define_insn "mve_vaddq_n_<supf><mode>"
 	 VADDQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vadd.i%#<V_sz_elem>	%q0, %q1, %2"
+  "vadd.i%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -894,7 +894,7 @@ (define_insn "mve_vaddvaq_<supf><mode>"
 	 VADDVAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddva.<supf>%#<V_sz_elem>	%0, %q2"
+  "vaddva.<supf>%#<V_sz_elem>\t%0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1834,7 +1834,7 @@ (define_insn "mve_vaddlvaq_<supf>v4si"
 	 VADDLVAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddlva.<supf>32 %Q0, %R0, %q2"
+  "vaddlva.<supf>32\t%Q0, %R0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1849,7 +1849,7 @@ (define_insn "mve_vaddq_n_f<mode>"
 	 VADDQ_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vadd.f%#<V_sz_elem>	%q0, %q1, %2"
+  "vadd.f%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3717,7 +3717,7 @@ (define_insn "mve_vaddlvaq_p_<supf>v4si"
 	 VADDLVAQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddlvat.<supf>32 %Q0, %R0, %q2"
+  "vpst\;vaddlvat.<supf>32\t%Q0, %R0, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 ;;
@@ -8928,7 +8928,7 @@ (define_insn "mve_vaddq<mode>"
 		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vadd.i%#<V_sz_elem>  %q0, %q1, %q2"
+  "vadd.i%#<V_sz_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -8942,7 +8942,7 @@ (define_insn "mve_vaddq_f<mode>"
 		    (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vadd.f%#<V_sz_elem> %q0, %q1, %q2"
+  "vadd.f%#<V_sz_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
index 0991ac1b355..3a9504df94e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo (int64_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddlvaq_p_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo1 (int64_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddlvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvat.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
index 5af786e8e76..6e2613ee099 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo (uint64_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddlvaq_p_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvat.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo1 (uint64_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddlvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvat.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint64_t
+foo2 (uint32x4_t b, mve_pred16_t p)
+{
+  return vaddlvaq_p (1, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
index 78f155f1586..180dc9b2deb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddlva.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo (int64_t a, int32x4_t b)
 {
   return vaddlvaq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddlva.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddlva.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo1 (int64_t a, int32x4_t b)
 {
   return vaddlvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddlva.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
index a7dfa2541ab..1f899e92c3c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo (uint64_t a, uint32x4_t b)
 {
   return vaddlvaq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddlva.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo1 (uint64_t a, uint32x4_t b)
 {
   return vaddlvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddlva.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint64_t
+foo2 (uint32x4_t b)
+{
+  return vaddlvaq (1, b);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
index 8aa18323b53..5b22da49c1d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo (int32x4_t a, mve_pred16_t p)
 {
   return vaddlvq_p_s32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo1 (int32x4_t a, mve_pred16_t p)
 {
   return vaddlvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
index a9cee74e2ee..2c85139435a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvt.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo (uint32x4_t a, mve_pred16_t p)
 {
   return vaddlvq_p_u32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddlvt.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo1 (uint32x4_t a, mve_pred16_t p)
 {
   return vaddlvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
index 4bd70aacc05..bdb04b5214f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddlv.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo (int32x4_t a)
 {
   return vaddlvq_s32 (a);
 }
 
-/* { dg-final { scan-assembler "vaddlv.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddlv.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo1 (int32x4_t a)
 {
-  return vaddlvq_s32 (a);
+  return vaddlvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddlv.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
index 2148bd9a32e..bcd9d21df4f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddlv.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo (uint32x4_t a)
 {
-    return vaddlvq_u32 (a);
+  return vaddlvq_u32 (a);
 }
 
-/* { dg-final { scan-assembler "vaddlv.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddlv.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo1 (uint32x4_t a)
 {
-    return vaddlvq (a);
+  return vaddlvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddlv.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
index 3d1100a9e81..58462177473 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vaddq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
index e15e0d13e4f..f3fcd286f4d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vaddq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
index 51d7020bd1f..291e65f32cc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vaddq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
index 7821bc241ff..0346f65a330 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vaddq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
index 796bed47613..9d57bbd27b9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vaddq_m_n_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vaddq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
index afa3c4c722e..9939aa0012d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vaddq_m_n_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vaddq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
index 0ef433724ba..50b138fc763 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vaddq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
index 46ac88e940d..66c2be777ce 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vaddq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
index 1867d5603d1..87dba75dff1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vaddq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
index 1da993b5e31..a8e9ea576b3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vaddq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
index d7404c9f4ce..045e5024d5d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vaddq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
index 013e83938b2..3d17afcbe56 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vaddq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
index 244c88fcf89..87210a41dae 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
index 7a59d75af11..1acb0b67fa9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
index 5b8c74ab017..6136c54cbb8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
index f28e3d789ab..b60d98e0691 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
index aeb836ce87d..d56bbae9b03 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
index c698df3a146..9f0b623c3e8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
index 024fab5c0b2..5df23a6e61f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b)
 {
   return vaddq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a)
+{
+  return vaddq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
index 06b1528460e..d07927c427e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b)
 {
   return vaddq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a)
+{
+  return vaddq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
index 63765f41deb..9ae30406f51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vaddq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
index e462fbfab8e..3271d4d5af1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vaddq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
index ad7181fd8f5..119fd5d5528 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vaddq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
index dac7a9fb9ba..ef0722e4dcd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vaddq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
index 2f1feb89d32..67513819f39 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vaddq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
index 325bdade765..2aa79e5e916 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vaddq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
index 31f6cb42e9f..24b12a6aee1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vaddq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
index 96aead168cc..3fdfa3d86e6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vaddq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
index 6676a2e269b..6b32b8ccfd5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vaddq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
index 1b19876e09a..0deefa14ac6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vaddq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
index 8f5acc69e79..44df963f0f8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vaddq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
index e5be2fa1b59..7349fa165bf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vaddq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vadd.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
index bd2a198eb72..b1d48a1d260 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vaddq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
index 5369f4d4876..047043d6526 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vaddq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
index d2eed8cf66f..ed67007df51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vaddq_x_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vaddq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
index 40d56da12b1..fa17d6b4aa2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vaddq_x_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vaddq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
index e974cdf914b..d6c3252132a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vaddq_x_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
index a6ac9ccd3af..c2a861706d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vaddq_x_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
index f5539ef9c67..abc90a4c86b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vaddq_x_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
index f167df122a0..8866a07bc8e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vaddq_x_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
index 653c3eed7a0..4123ad594ed 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vaddq_x_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
index 0ad65c8dde5..d610930a311 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vaddq_x_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
index 75b1491e17d..323010a6d33 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
index 1aadebda459..98773e7ba6f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
index d6b07cee79a..bff0bda1109 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
index 5c9abc2492a..85f5cd4db7a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
index d55ec735460..ad0e7afbc39 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
index bcc058b3769..a3cfc5686e2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vaddt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
index c4bfe34aa91..16b51514be1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddvaq_p_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int16x8_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
index cdc32807a24..bbf04aa0d08 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddvaq_p_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
index d330411115a..f06623b1893 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddvaq_p_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int8x16_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
index 74d9246cd63..7bfb4bb9cbe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddvaq_p_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint16x8_t b, mve_pred16_t p)
+{
+  return vaddvaq_p (1, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
index e4ec42b2544..9aea5caa4fe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddvaq_p_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint32x4_t b, mve_pred16_t p)
+{
+  return vaddvaq_p (1, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
index f9bed8379a4..b5113b209c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddvaq_p_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vaddvaq_p (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vaddvat.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint8x16_t b, mve_pred16_t p)
+{
+  return vaddvaq_p (1, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
index 5f6a8cf9d89..1b9af185a0d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int16x8_t b)
 {
   return vaddvaq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int16x8_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
index 29e27f59328..e25487954d2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int32x4_t b)
 {
   return vaddvaq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int32x4_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
index cac43464679..d37c916c94d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32_t a, int8x16_t b)
 {
   return vaddvaq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32_t a, int8x16_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
index c943fa5789f..b3583ce5725 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint16x8_t b)
 {
   return vaddvaq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint16x8_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint16x8_t b)
+{
+  return vaddvaq (1, b);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
index 0950ff50d0f..006c0a3734f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint32x4_t b)
 {
   return vaddvaq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint32x4_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint32x4_t b)
+{
+  return vaddvaq (1, b);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
index 2a58225fbe3..cfe29bfd7be 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32_t a, uint8x16_t b)
 {
   return vaddvaq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32_t a, uint8x16_t b)
 {
   return vaddvaq (a, b);
 }
 
-/* { dg-final { scan-assembler "vaddva.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint8x16_t b)
+{
+  return vaddvaq (1, b);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
index a786b8974b7..3d19b46fdc6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int16x8_t a, mve_pred16_t p)
 {
   return vaddvq_p_s16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int16x8_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
index c688782180f..a148d15ead1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32x4_t a, mve_pred16_t p)
 {
   return vaddvq_p_s32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32x4_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
index 8438448f86c..f0b0c499d0d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int8x16_t a, mve_pred16_t p)
 {
   return vaddvq_p_s8 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int8x16_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
index ec7a5fa5a7f..2fb316c50ab 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint16x8_t a, mve_pred16_t p)
 {
   return vaddvq_p_u16 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint16x8_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
index b70968880ce..24bde90ec77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32x4_t a, mve_pred16_t p)
 {
   return vaddvq_p_u32 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32x4_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
index 69381b78cc4..f6710941119 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint8x16_t a, mve_pred16_t p)
 {
   return vaddvq_p_u8 (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vaddvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint8x16_t a, mve_pred16_t p)
 {
   return vaddvq_p (a, p);
 }
 
-/* { dg-final { scan-assembler "vaddvt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
index b4fc11f4aa4..6b9a99f2b07 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int16x8_t a)
 {
   return vaddvq_s16 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int16x8_t a)
 {
-  return vaddvq_s16 (a);
+  return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
index 438b46ec246..50823b65ecc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int32x4_t a)
 {
   return vaddvq_s32 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int32x4_t a)
 {
-  return vaddvq_s32 (a);
+  return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
index b60b1f2da98..131edbe2b3f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
@@ -1,21 +1,33 @@
-/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo (int8x16_t a)
 {
   return vaddvq_s8 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
 foo1 (int8x16_t a)
 {
   return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
index de782127faf..7c0ac0e1395 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint16x8_t a)
 {
-    return vaddvq_u16 (a);
+  return vaddvq_u16 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint16x8_t a)
 {
-    return vaddvq (a);
+  return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
index c4672e42288..40779ed0f99 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint32x4_t a)
 {
-    return vaddvq_u32 (a);
+  return vaddvq_u32 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint32x4_t a)
 {
-    return vaddvq (a);
+  return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
index e4e149cfb61..d2a6ba8f0fb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vaddv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo (uint8x16_t a)
 {
-    return vaddvq_u8 (a);
+  return vaddvq_u8 (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vaddv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
 foo1 (uint8x16_t a)
 {
-    return vaddvq (a);
+  return vaddvq (a);
 }
 
-/* { dg-final { scan-assembler "vaddv.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 18/35] arm: improve tests for vmulq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (16 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 17/35] arm: improve tests and fix vadd* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:51   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 19/35] arm: improve tests and fix vsubq* Andrea Corallo
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vmulq_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.
---
 .../gcc.target/arm/mve/intrinsics/vmulq_f16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_f32.c | 16 ++++++-
 .../arm/mve/intrinsics/vmulq_m_f16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_f32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_f16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_f32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_s16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_s32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_s8.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_u16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_u32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_m_n_u8.c         | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_m_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_m_u8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_n_f16.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vmulq_n_f32.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vmulq_n_s16.c          | 16 ++++++-
 .../arm/mve/intrinsics/vmulq_n_s32.c          | 16 ++++++-
 .../arm/mve/intrinsics/vmulq_n_s8.c           | 16 ++++++-
 .../arm/mve/intrinsics/vmulq_n_u16.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vmulq_n_u32.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vmulq_n_u8.c           | 28 ++++++++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_s16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_s32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_s8.c  | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_u16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_u32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vmulq_u8.c  | 16 ++++++-
 .../arm/mve/intrinsics/vmulq_x_f16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_f32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_f16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_f32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_s16.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_s32.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_s8.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_u16.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_u32.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_x_n_u8.c         | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vmulq_x_s16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_s32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_s8.c           | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_u16.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_u32.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vmulq_x_u8.c           | 26 ++++++++++--
 48 files changed, 1148 insertions(+), 160 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
index 68fb012ad34..9251809bfa1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vmulq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
index 512661aeec7..3dacb7ad77c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vmulq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
index d05d48f6261..8f47e962633 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmulq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
index 8c2ec81da3b..41f3786e5fe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmulq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
index 1f1d408d5b9..2f4fecbf56b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vmulq_m_n_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vmulq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
index 4aae0849e2b..2ad4108d637 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vmulq_m_n_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vmulq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
index 9a87f7d3643..b10bd5af687 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vmulq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
index da7d38b9968..e8bdf7278ad 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vmulq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
index 227b3a50a92..001e888e075 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vmulq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
index e09334df1de..5015f20a4be 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vmulq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vmulq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
index 62d6c262e5a..a6013a42721 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vmulq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vmulq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
index e7993ab3c31..42fc7264229 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vmulq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vmulq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
index 61cdf656c19..04fdc010f5b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmulq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
index 622407b96da..96178d02e37 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmulq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
index bb2943cc727..aa3b8061122 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmulq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
index a0680174753..e56ab77f3ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmulq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
index 586a32560d7..72e313cfd78 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmulq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
index 0a8e49a5982..1ae6a93934c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmulq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmulq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
index a3f693f06f7..d77aeb219ca 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b)
 {
   return vmulq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a)
+{
+  return vmulq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
index 5d1cfa368a7..9ef6a21b2bd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b)
 {
   return vmulq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a)
+{
+  return vmulq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
index 98e84cbf202..7ea25dce4a7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vmulq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
index adbfd6fe10b..b884603ac5b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vmulq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
index c845f108f88..8e6e17cd593 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vmulq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
index e52acdc53b9..907bb0a4009 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vmulq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vmulq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
index 9da4bc1f359..1164b29fc76 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vmulq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vmulq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
index e0f152db729..ccc950e3ccf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vmulq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vmulq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
index 89cc604fda0..a1fc1fc8f04 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vmulq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
index f87fbf1249c..4fcf0dd88d1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vmulq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
index 4e40065ad22..d0c147ef912 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vmulq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
index ae95bf68afe..d4a24ba95b6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vmulq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
index 4f8e9762d5f..c9194b73eaf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vmulq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
index a3776ff8314..d69402021ec 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vmulq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vmulq (a, b);
 }
 
-/* { dg-final { scan-assembler "vmul.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
index 1f864cf481a..169871b47d8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmulq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
index 07cc3d0277c..f800731b3ff 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmulq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
index 8fa6c759d54..a4dc47725b5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vmulq_x_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vmulq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
index 654713c1348..e8428fe9b2d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vmulq_x_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vmulq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
index 4ec5ab397e1..27ef55d932a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vmulq_x_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
index c52180067cf..929f420bd4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vmulq_x_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
index a2a7c734de8..31885a2d90f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vmulq_x_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
index 419a3cb6ea6..5972a525092 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vmulq_x_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vmulq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
index 5acfcf6bf61..3e02a542988 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vmulq_x_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vmulq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
index 27e95ced0b5..9b59b189a5f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vmulq_x_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vmulq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
index 5c232bfdc34..09b7169a68b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmulq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
index 685fe45e4d0..a57ef2da840 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmulq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
index 19ecc6bcafc..7fb5e007990 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmulq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
index 0700ca818ab..7b1c6b2acc8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmulq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
index a1cb2aa221e..bc53faff33f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmulq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
index 3b29852c830..f43760861d4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmulq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vmulq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmult.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 19/35] arm: improve tests and fix vsubq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (17 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 18/35] arm: improve tests for vmulq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:51   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 20/35] arm: improve tests for vfmasq_m* Andrea Corallo
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vsubq_n_f<mode>): Fix spacing.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vsubq_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vsubq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u8.c: Likewise.
---
 gcc/config/arm/mve.md                         |  2 +-
 .../gcc.target/arm/mve/intrinsics/vsubq_f16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_f32.c | 16 ++++++-
 .../arm/mve/intrinsics/vsubq_m_f16.c          | 26 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_f32.c          | 26 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_f16.c        | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_f32.c        | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_s16.c        | 26 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_s32.c        | 26 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_s8.c         | 26 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_u16.c        | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_u32.c        | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vsubq_m_n_u8.c         | 42 ++++++++++++++--
 .../arm/mve/intrinsics/vsubq_m_s16.c          | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_s32.c          | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_s8.c           | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_u16.c          | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_u32.c          | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_m_u8.c           | 25 ++++++++--
 .../arm/mve/intrinsics/vsubq_n_f16.c          | 28 ++++++++++-
 .../arm/mve/intrinsics/vsubq_n_f32.c          | 28 ++++++++++-
 .../arm/mve/intrinsics/vsubq_n_s16.c          | 17 +++++--
 .../arm/mve/intrinsics/vsubq_n_s32.c          | 17 +++++--
 .../arm/mve/intrinsics/vsubq_n_s8.c           | 17 +++++--
 .../arm/mve/intrinsics/vsubq_n_u16.c          | 29 +++++++++--
 .../arm/mve/intrinsics/vsubq_n_u32.c          | 29 +++++++++--
 .../arm/mve/intrinsics/vsubq_n_u8.c           | 29 +++++++++--
 .../gcc.target/arm/mve/intrinsics/vsubq_s16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_s32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_s8.c  | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_u16.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_u32.c | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vsubq_u8.c  | 16 ++++++-
 .../arm/mve/intrinsics/vsubq_x_f16.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_f32.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_f16.c        | 48 +++++++++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_f32.c        | 48 +++++++++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_s16.c        | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_s32.c        | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_s8.c         | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_u16.c        | 48 +++++++++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_u32.c        | 48 +++++++++++++++++--
 .../arm/mve/intrinsics/vsubq_x_n_u8.c         | 48 +++++++++++++++++--
 .../arm/mve/intrinsics/vsubq_x_s16.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_s32.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_s8.c           | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_u16.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_u32.c          | 32 +++++++++++--
 .../arm/mve/intrinsics/vsubq_x_u8.c           | 32 +++++++++++--
 49 files changed, 1261 insertions(+), 145 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 5ce2a289225..714dc6fc7ce 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -679,7 +679,7 @@ (define_insn "mve_vsubq_n_f<mode>"
 	 VSUBQ_N_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vsub.f<V_sz_elem>  %q0, %q1, %2"
+  "vsub.f<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
index 8e3ce24fa49..3d82b081ca2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b)
 {
   return vsubq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16x8_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
index 5cb239d70fa..d0f64bb9872 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b)
 {
   return vsubq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32x4_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
index f4b3f806822..434b0a7ced8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vsubq_m_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
index 75dbf9335c9..0b8e056647e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vsubq_m_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
index 556a0845087..abbd60060a7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vsubq_m_n_f16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
index e53f5f1966a..40ca4284a1f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vsubq_m_n_f32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
index 73443d500ba..f13eff8ad2d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vsubq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
index b4031111678..21ba17ba869 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vsubq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
index 5c4e1019225..c75b8b5420d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vsubq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
index 04a3036ede8..700bc01833c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vsubq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
index a21f9366373..25dd37ae5b2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vsubq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
index 18f635f1e1a..4fed154d258 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vsubq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
index 598d648887b..dde77dc51b7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vsubq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
index af6750278f1..8770e31ad95 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vsubq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
index 5effbe2e017..c9813313594 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vsubq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
index 12218ae6791..eebc3ad6929 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vsubq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
index 3a63eeb2b3d..d85bbec7ebf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vsubq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
index a17a2741a47..a104a74e259 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vsubq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
index 10e27dae907..4db52649ab4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b)
 {
   return vsubq_n_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo1 (float16x8_t a, float16_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a)
+{
+  return vsubq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
index 9e16d6c075c..fe97eed7d37 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b)
 {
   return vsubq_n_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo1 (float32x4_t a, float32_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a)
+{
+  return vsubq (a, 1.1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
index 7f2af8691c0..d695fc83e06 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
@@ -1,22 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vsubq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
index a5e6bf486fd..c281e21ab0c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
@@ -1,22 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vsubq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
index 5754379358d..ef36b4d6330 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
@@ -1,22 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vsubq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
index ea0a3f9260c..be754d894a8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vsubq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
+/*
+**foo2:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
index cc409b59438..ef0aaa4cf08 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vsubq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
+/*
+**foo2:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
index 8a18a89b353..c55aefc3307 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
@@ -1,22 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vsubq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
+/*
+**foo2:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
index 15e732f1f66..469395452bd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vsubq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
index 5b4ee855711..0e60e1c6f60 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vsubq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
index b23893af605..882d63dfcf7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vsubq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
index edb5e354411..fe9baf3d52c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vsubq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
index 68040afd52b..b82051d69d5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vsubq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
index 92c4f059b0e..630b2f79f1f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vsubq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vsub.i8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
index 4cb8be0ea7f..c48bea7e9f0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
 {
-    return vsubq_x_f16 (a, b, p);
+  return vsubq_x_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
index f6711d7f207..d3e129bb6ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
 {
-    return vsubq_x_f32 (a, b, p);
+  return vsubq_x_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
index c4adacbf5be..2dcaff58c09 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
@@ -1,15 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16x8_t a, float16_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_f16 (a, b, p);
+  return vsubq_x_n_f16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t a, mve_pred16_t p)
+{
+  return vsubq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
index a4affa0a3a9..92bafa3c4cc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
@@ -1,15 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32x4_t a, float32_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_f32 (a, b, p);
+  return vsubq_x_n_f32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t a, mve_pred16_t p)
+{
+  return vsubq_x (a, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
index 99c59b1a6c1..f01e8d7d490 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_s16 (a, b, p);
+  return vsubq_x_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int16x8_t
+foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
index 6c29ebec05c..506966424cc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_s32 (a, b, p);
+  return vsubq_x_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int32x4_t
+foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
index 0f83c305473..3c4a5d8129c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_s8 (a, b, p);
+  return vsubq_x_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int8x16_t
+foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
index 9a372d762d1..958e5aa2ce8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
@@ -1,15 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_u16 (a, b, p);
+  return vsubq_x_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
index 5219f154fa9..ba39c75bb2b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
@@ -1,15 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_u32 (a, b, p);
+  return vsubq_x_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
index 0a0bcf8623a..19204d1d80f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
@@ -1,15 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
-    return vsubq_x_n_u8 (a, b, p);
+  return vsubq_x_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
index 37936a6d647..8dcc5477c6f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
-    return vsubq_x_s16 (a, b, p);
+  return vsubq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
index c085f59c6a2..a2d43323227 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
-    return vsubq_x_s32 (a, b, p);
+  return vsubq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
index 361507821ea..8ead3d22439 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
-    return vsubq_x_s8 (a, b, p);
+  return vsubq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
index 21423dc4f80..f0faf8165d2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
-    return vsubq_x_u16 (a, b, p);
+  return vsubq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
index 38dd09ad8f7..67a70931859 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
-    return vsubq_x_u32 (a, b, p);
+  return vsubq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
index 406cbf760fd..19002336cbd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
@@ -1,15 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
-    return vsubq_x_u8 (a, b, p);
+  return vsubq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vsubt.i8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
+{
+  return vsubq_x (a, b, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 20/35] arm: improve tests for vfmasq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (18 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 19/35] arm: improve tests and fix vsubq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:52   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 21/35] arm: improve tests for vhaddq_m* Andrea Corallo
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c: Likewise.
---
 .../arm/mve/intrinsics/vfmasq_m_n_f16.c       | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vfmasq_m_n_f32.c       | 50 ++++++++++++++++---
 2 files changed, 84 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
index 06d2d114e46..03b376c9bbe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
-foo (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
+foo (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
 {
-  return vfmasq_m_n_f16 (a, b, c, p);
+  return vfmasq_m_n_f16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vfmast.f16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
-foo1 (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
+foo1 (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
 {
-  return vfmasq_m (a, b, c, p);
+  return vfmasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vfmast.f16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t m1, float16x8_t m2, mve_pred16_t p)
+{
+  return vfmasq_m (m1, m2, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
index bf1773d0eeb..ecf30ba9826 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
-foo (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
+foo (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
 {
-  return vfmasq_m_n_f32 (a, b, c, p);
+  return vfmasq_m_n_f32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vfmast.f32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
-foo1 (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
+foo1 (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
 {
-  return vfmasq_m (a, b, c, p);
+  return vfmasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vfmast.f32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t m1, float32x4_t m2, mve_pred16_t p)
+{
+  return vfmasq_m (m1, m2, 1.1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 21/35] arm: improve tests for vhaddq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (19 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 20/35] arm: improve tests for vfmasq_m* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:53   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 22/35] arm: improve tests for vhsubq_m* Andrea Corallo
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vhaddq_m_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_s16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_s32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_s8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_u16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_u32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_m_u8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_n_s16.c         | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_n_s32.c         | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_n_s8.c          | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_n_u16.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhaddq_n_u32.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhaddq_n_u8.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhaddq_s16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_s32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_u16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_u32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c | 16 ++++++-
 .../arm/mve/intrinsics/vhaddq_x_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhaddq_x_s16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhaddq_x_s32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhaddq_x_s8.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vhaddq_x_u16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhaddq_x_u32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhaddq_x_u8.c          | 25 +++++++++--
 36 files changed, 828 insertions(+), 114 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
index e90af963697..0bd03832ff5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
index fcce85fd1bd..42fe35dc746 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
index 56558b7033a..1f4a4016c74 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
index d7ee0febab9..7d7ebebd638 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vhaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
index 1117b9813ce..31f7ee2fa54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vhaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
index 90c66595d3f..2120472af46 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhaddq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vhaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
index e8b87283a73..4b4ce40efb8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhaddq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
index ddcfd11198e..e532055c675 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhaddq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
index ef5fcd02cc5..25b81629ec3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhaddq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
index d7b9aaab62c..4a9e9f3f438 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhaddq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
index c8d7f6c4cf3..1e68099ebf2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhaddq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
index 9792941b091..6dd75d7336e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhaddq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
index d0d77f5a7fd..20a999da1d2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vhaddq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
index a8b4f3415a1..986cb8d3ba5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vhaddq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
index 2459ba0a7ab..57a4b36f5fe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vhaddq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
index cd681e7a5f9..abed33b0e37 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vhaddq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vhaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
index d2cb7f6284e..5e5204fb3a7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vhaddq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vhaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
index 509e1746259..b35221ef81b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vhaddq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vhaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
index 47afc591cdb..310964f3440 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vhaddq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
index fdc6476d0ee..d8222645c21 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vhaddq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
index 3321765e909..85b2feee346 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vhaddq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
index ad46355feab..2da0aa053e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vhaddq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
index 7477585fe55..49b865a123b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vhaddq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
index 9edf8e5eb90..5ecd3cbf6ec 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vhaddq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vhaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhadd.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
index 5a9302129c7..a4e277d4e1f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
index 0a4ef00afa1..c79b88d6ced 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
index ae6c27a8878..61893536231 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
index ddc99a82f79..146d226f36f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vhaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
index dce9bc212e2..b70014fb6a5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vhaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
index 262c5937a91..03978dfa28a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhaddq_x_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vhaddq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
index 65df0093401..c3c787583dd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhaddq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
index 7ff76e7170a..a1ab196d3d2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhaddq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
index 23f545c45cd..061ae89315e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhaddq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
index 97674c1f73c..0ee88520f8f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhaddq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
index b6404ce9d17..0a0e512c5fc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhaddq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
index 7c2d74a2662..c495641c532 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhaddq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhaddq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 22/35] arm: improve tests for vhsubq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (20 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 21/35] arm: improve tests for vhaddq_m* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:53   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 23/35] arm: improve tests for viwdupq* Andrea Corallo
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vhsubq_m_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_s16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_s32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_s8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_u16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_u32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_m_u8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_n_s16.c         | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_n_s32.c         | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_n_s8.c          | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_n_u16.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhsubq_n_u32.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhsubq_n_u8.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vhsubq_s16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_s32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_u16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_u32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c | 16 ++++++-
 .../arm/mve/intrinsics/vhsubq_x_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vhsubq_x_s16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhsubq_x_s32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhsubq_x_s8.c          | 25 +++++++++--
 .../arm/mve/intrinsics/vhsubq_x_u16.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhsubq_x_u32.c         | 25 +++++++++--
 .../arm/mve/intrinsics/vhsubq_x_u8.c          | 25 +++++++++--
 36 files changed, 828 insertions(+), 114 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
index 27dcb7be957..6390589808f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
index 75ae735f30d..db09d0f2c21 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
index 84cdeb42952..89ea3f2aaf8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
index bc6610c3812..e6fb8be673b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vhsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
index e94bfc95027..7ab815d5623 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vhsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
index c2a5674afd1..0bf695aded4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhsubq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vhsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
index 9f62a385554..3bad177ad28 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhsubq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
index 486ae6b7d58..cc5cdb07059 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhsubq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
index 9faaa4fbb0d..4c651091e59 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhsubq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
index aa5838cdad2..daed202c055 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhsubq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
index 00282ad6444..cf71e6dab13 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhsubq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
index 187d5bcf8a1..a8183dd48ed 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhsubq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
index ce766486aed..af4f534d7ff 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vhsubq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
index 1d820ffaf5a..941d38074a4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vhsubq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
index 90110b78f0d..9ceb4ef3c6f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vhsubq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
index e744ef58663..037ed2c637d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vhsubq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vhsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
index b1ce3f07904..f51eb10ecbf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vhsubq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vhsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
index 68872a8f900..24dd45db152 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vhsubq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vhsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
index 03bd6d595cb..0f275d48753 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vhsubq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
index 515acb84e66..21aeb9d2a59 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vhsubq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
index 41fb2589924..b3ee94341b5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vhsubq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
index dda18779dca..690ef2de5ba 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vhsubq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
index 86a5576bedf..cfe12573fa0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vhsubq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
index d339ca0e5e4..1926bc34219 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vhsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vhsubq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vhsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vhsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vhsub.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
index 09da5c2f040..fcda4c541a6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
index f3c032987bc..55637221f21 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
index 1d86f7d72b3..ecfe188f3fa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
index df6b7ea427a..bf3d6c38b85 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a, mve_pred16_t p)
+{
+  return vhsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
index bea6f2d1f96..4ae75b09950 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a, mve_pred16_t p)
+{
+  return vhsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
index e1fafd7a9f5..edfa4216a31 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhsubq_x_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a, mve_pred16_t p)
+{
+  return vhsubq_x (a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
index c9d3ffb45b7..bd2771b0978 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhsubq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
index 36343cffc85..0ea40df3d9e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhsubq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
index d1b134fe480..90ee94defb0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhsubq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
index 4da0fb3f340..d700741169a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhsubq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
index dfb0a6d371f..f43c9626829 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhsubq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
index d549892ef8b..a0908ba786b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhsubq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vhsubq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 23/35] arm: improve tests for viwdupq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (21 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 22/35] arm: improve tests for vhsubq_m* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:54   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 24/35] arm: improve tests for vmladavaq* Andrea Corallo
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.
---
 .../arm/mve/intrinsics/viwdupq_m_n_u16.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_m_n_u32.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_m_n_u8.c       | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_m_wb_u16.c     | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_m_wb_u32.c     | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_m_wb_u8.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_n_u16.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/viwdupq_n_u32.c        | 32 ++++++++++--
 .../arm/mve/intrinsics/viwdupq_n_u8.c         | 28 ++++++++++-
 .../arm/mve/intrinsics/viwdupq_wb_u16.c       | 36 ++++++++++---
 .../arm/mve/intrinsics/viwdupq_wb_u32.c       | 36 ++++++++++---
 .../arm/mve/intrinsics/viwdupq_wb_u8.c        | 36 ++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_n_u16.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_n_u32.c      | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_n_u8.c       | 46 ++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_wb_u16.c     | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_wb_u32.c     | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/viwdupq_x_wb_u8.c      | 50 ++++++++++++++++---
 18 files changed, 658 insertions(+), 106 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
index 0f999cc672b..67a2465f435 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_n_u16 (inactive, a, b, 2, p);
+  return viwdupq_m_n_u16 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 2, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
index f79c91eaf4c..9fc2518acc5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_n_u32 (inactive, a, b, 4, p);
+  return viwdupq_m_n_u32 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 4, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
index c0fee9fa752..39f4071bfa1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_n_u8 (inactive, a, b, 8, p);
+  return viwdupq_m_n_u8 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 8, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
index 468ba179f62..8bb680e0d77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_wb_u16 (inactive, a, b, 2, p);
+  return viwdupq_m_wb_u16 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 2, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
index e9190302717..2dc8d5f3442 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_wb_u32 (inactive, a, b, 4, p);
+  return viwdupq_m_wb_u32 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 4, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
index 309ce95a333..ff3a5f520e8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m_wb_u8 (inactive, a, b, 8, p);
+  return viwdupq_m_wb_u8 (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_m (inactive, a, b, 8, p);
+  return viwdupq_m (inactive, a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, mve_pred16_t p)
+{
+  return viwdupq_m (inactive, 1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
index 599d9078464..5f37290759a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, uint32_t b)
 {
-  return viwdupq_n_u16 (a, b, 2);
+  return viwdupq_n_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, uint32_t b)
 {
-  return viwdupq_u16 (a, b, 2);
+  return viwdupq_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return viwdupq_u16 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
index 7c2af74b3f0..de93f8a7ec4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32_t b)
 {
-  return viwdupq_n_u32 (a, b, 4);
+  return viwdupq_n_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, uint32_t b)
 {
-  return viwdupq_u32 (a, b, 4);
+  return viwdupq_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return viwdupq_u32 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
index 4ff60791f3b..089025c3401 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, uint32_t b)
 {
   return viwdupq_n_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, uint32_t b)
 {
   return viwdupq_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return viwdupq_u8 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
index 1e5ce88dcca..fc3e9c6fac4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint32_t * a, uint32_t b)
+foo (uint32_t *a, uint32_t b)
 {
-  return viwdupq_wb_u16 (a, b, 4);
+  return viwdupq_wb_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint32_t * a, uint32_t b)
+foo1 (uint32_t *a, uint32_t b)
 {
-  return viwdupq_u16 (a, b, 4);
+  return viwdupq_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u16"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 ()
+{
+  return viwdupq_u16 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
index 0c076f7b751..4c098dd8f02 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32_t * a, uint32_t b)
+foo (uint32_t *a, uint32_t b)
 {
-  return viwdupq_wb_u32 (a, b, 8);
+  return viwdupq_wb_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32_t * a, uint32_t b)
+foo1 (uint32_t *a, uint32_t b)
 {
-  return viwdupq_u32 (a, b, 8);
+  return viwdupq_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u32"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 ()
+{
+  return viwdupq_u32 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
index 9e5118ba2b6..44cb53fe344 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint32_t * a, uint32_t b)
+foo (uint32_t *a, uint32_t b)
 {
-  return viwdupq_wb_u8 (a, b, 2);
+  return viwdupq_wb_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint32_t * a, uint32_t b)
+foo1 (uint32_t *a, uint32_t b)
 {
-  return viwdupq_u8 (a, b, 2);
+  return viwdupq_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "viwdup.u8"  }  } */
+/*
+**foo2:
+**	...
+**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 ()
+{
+  return viwdupq_u8 (1, 1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
index fdaf6be282d..2242877881f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_n_u16 (a, b, 2, p);
+  return viwdupq_x_n_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u16 (a, b, 2, p);
+  return viwdupq_x_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u16 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
index affc6162015..4b2b650e21a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_n_u32 (a, b, 4, p);
+  return viwdupq_x_n_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u32 (a, b, 4, p);
+  return viwdupq_x_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u32 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
index 8137c623c2a..873952b6c2e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_n_u8 (a, b, 8, p);
+  return viwdupq_x_n_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u8 (a, b, 8, p);
+  return viwdupq_x_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u8 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
index d7aa141f384..b6c94797380 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_wb_u16 (a, b, 8, p);
+  return viwdupq_x_wb_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u16 (a, b, 8, p);
+  return viwdupq_x_u16 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u16 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
index 7fe56963452..5fd84963d01 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_wb_u32 (a, b, 2, p);
+  return viwdupq_x_wb_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u32 (a, b, 2, p);
+  return viwdupq_x_u32 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u32 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
index 8e3ecefdedb..abbb40fa8da 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_wb_u8 (a, b, 4, p);
+  return viwdupq_x_wb_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
+foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
 {
-  return viwdupq_x_u8 (a, b, 4, p);
+  return viwdupq_x_u8 (a, b, 1, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (mve_pred16_t p)
+{
+  return viwdupq_x_u8 (1, 1, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 24/35] arm: improve tests for vmladavaq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (22 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 23/35] arm: improve tests for viwdupq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:54   ` Kyrylo Tkachov
  2022-11-17 16:37 ` [PATCH 25/35] arm: improve tests and fix vmlaldavaxq* Andrea Corallo
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c: Likewise.
---
 .../arm/mve/intrinsics/vmladavaq_p_s16.c      | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaq_p_s32.c      | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaq_p_s8.c       | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaq_p_u16.c      | 49 ++++++++++++++++---
 .../arm/mve/intrinsics/vmladavaq_p_u32.c      | 49 ++++++++++++++++---
 .../arm/mve/intrinsics/vmladavaq_p_u8.c       | 49 ++++++++++++++++---
 .../arm/mve/intrinsics/vmladavaxq_p_s16.c     | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaxq_p_s32.c     | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaxq_p_s8.c      | 33 ++++++++++---
 .../arm/mve/intrinsics/vmladavaxq_s16.c       | 24 ++++++---
 .../arm/mve/intrinsics/vmladavaxq_s32.c       | 24 ++++++---
 .../arm/mve/intrinsics/vmladavaxq_s8.c        | 24 ++++++---
 12 files changed, 336 insertions(+), 81 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
index e458204c41b..f3e5eba3b08 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_s16 (a, b, c, p);
+  return vmladavaq_p_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
-/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
index e3544787adb..71f6957bfc5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_s32 (a, b, c, p);
+  return vmladavaq_p_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo1 (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
-/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
index 1d4ca722f44..a74317aeff9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+foo (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_s8 (a, b, c, p);
+  return vmladavaq_p_s8 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+foo1 (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
-/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
index 91a11c8b8b1..9ac84d46a07 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
+foo (uint32_t add, uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_u16 (a, b, c, p);
+  return vmladavaq_p_u16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo1 (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
+foo1 (uint32_t add, uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
-/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
+{
+  return vmladavaq_p (1, m1, m2, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
index 0efe8d0902f..4a3d109ed90 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+foo (uint32_t add, uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_u32 (a, b, c, p);
+  return vmladavaq_p_u32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo1 (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+foo1 (uint32_t add, uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
-/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
+{
+  return vmladavaq_p (1, m1, m2, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
index a8da9b0d2ef..a17440f4675 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
@@ -1,22 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
+foo (uint32_t add, uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p_u8 (a, b, c, p);
+  return vmladavaq_p_u8 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32_t
-foo1 (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
+foo1 (uint32_t add, uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaq_p (a, b, c, p);
+  return vmladavaq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
-/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint32_t
+foo2 (uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
+{
+  return vmladavaq_p (1, m1, m2, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
index 838717e3e43..f201d5fa047 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p_s16 (a, b, c, p);
+  return vmladavaxq_p_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p (a, b, c, p);
+  return vmladavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
-/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
index a50c5ecf802..c90647a5064 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p_s32 (a, b, c, p);
+  return vmladavaxq_p_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo1 (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p (a, b, c, p);
+  return vmladavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
-/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
index e4705cecad9..57af7bc1c78 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+foo (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p_s8 (a, b, c, p);
+  return vmladavaxq_p_s8 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmladavaxt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+foo1 (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
 {
-  return vmladavaxq_p (a, b, c, p);
+  return vmladavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
-/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
index ffd542a062f..684580d1c36 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmladavax.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int16x8_t b, int16x8_t c)
+foo (int32_t add, int16x8_t m1, int16x8_t m2)
 {
-  return vmladavaxq_s16 (a, b, c);
+  return vmladavaxq_s16 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmladavax.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int16x8_t b, int16x8_t c)
+foo1 (int32_t add, int16x8_t m1, int16x8_t m2)
 {
-  return vmladavaxq (a, b, c);
+  return vmladavaxq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
index b91e54d79e6..5d152647b55 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmladavax.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int32x4_t b, int32x4_t c)
+foo (int32_t add, int32x4_t m1, int32x4_t m2)
 {
-  return vmladavaxq_s32 (a, b, c);
+  return vmladavaxq_s32 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmladavax.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int32x4_t b, int32x4_t c)
+foo1 (int32_t add, int32x4_t m1, int32x4_t m2)
 {
-  return vmladavaxq (a, b, c);
+  return vmladavaxq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
index 61949c416fc..71bcdc9b55e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmladavax.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo (int32_t a, int8x16_t b, int8x16_t c)
+foo (int32_t add, int8x16_t m1, int8x16_t m2)
 {
-  return vmladavaxq_s8 (a, b, c);
+  return vmladavaxq_s8 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmladavax.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32_t
-foo1 (int32_t a, int8x16_t b, int8x16_t c)
+foo1 (int32_t add, int8x16_t m1, int8x16_t m2)
 {
-  return vmladavaxq (a, b, c);
+  return vmladavaxq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmladavax.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (23 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 24/35] arm: improve tests for vmladavaq* Andrea Corallo
@ 2022-11-17 16:37 ` Andrea Corallo
  2022-11-22 16:56   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 26/35] arm: improve tests for vmlasq* Andrea Corallo
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vmlaldavaq_<supf><mode>)
	(mve_vmlaldavaxq_s<mode>, mve_vmlaldavaxq_p_<supf><mode>): Fix
	spacing vs tabs.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.
---
 gcc/config/arm/mve.md                         |  6 ++--
 .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c    | 32 +++++++++++++++----
 .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c    | 32 +++++++++++++++----
 .../arm/mve/intrinsics/vmlaldavaxq_s16.c      | 24 ++++++++++----
 .../arm/mve/intrinsics/vmlaldavaxq_s32.c      | 24 ++++++++++----
 5 files changed, 91 insertions(+), 27 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 714dc6fc7ce..d2ffae6a425 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -4163,7 +4163,7 @@ (define_insn "mve_vmlaldavaq_<supf><mode>"
 	 VMLALDAVAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmlaldava.<supf>%#<V_sz_elem> %Q0, %R0, %q2, %q3"
+  "vmlaldava.<supf>%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -4179,7 +4179,7 @@ (define_insn "mve_vmlaldavaxq_s<mode>"
 	 VMLALDAVAXQ_S))
   ]
   "TARGET_HAVE_MVE"
-  "vmlaldavax.s%#<V_sz_elem> %Q0, %R0, %q2, %q3"
+  "vmlaldavax.s%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -6126,7 +6126,7 @@ (define_insn "mve_vmlaldavaxq_p_<supf><mode>"
 	 VMLALDAVAXQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vmlaldavaxt.<supf>%#<V_sz_elem> %Q0, %R0, %q2, %q3"
+  "vpst\;vmlaldavaxt.<supf>%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
index f33d3880236..87f0354a636 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlaldavaxt.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p_s16 (a, b, c, p);
+  return vmlaldavaxq_p_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlaldavaxt.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo1 (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo1 (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p (a, b, c, p);
+  return vmlaldavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
index ab072a9850e..d26bf5b90af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlaldavaxt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p_s32 (a, b, c, p);
+  return vmlaldavaxq_p_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlaldavaxt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo1 (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p (a, b, c, p);
+  return vmlaldavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
index e68fbd2df94..3a37e7a58a9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlaldavax.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo (int64_t a, int16x8_t b, int16x8_t c)
+foo (int64_t add, int16x8_t m1, int16x8_t m2)
 {
-  return vmlaldavaxq_s16 (a, b, c);
+  return vmlaldavaxq_s16 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmlaldavax.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlaldavax.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo1 (int64_t a, int16x8_t b, int16x8_t c)
+foo1 (int64_t add, int16x8_t m1, int16x8_t m2)
 {
-  return vmlaldavaxq (a, b, c);
+  return vmlaldavaxq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmlaldavax.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
index 7b6fea289da..155b8be70f0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlaldavax.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo (int64_t a, int32x4_t b, int32x4_t c)
+foo (int64_t add, int32x4_t m1, int32x4_t m2)
 {
-  return vmlaldavaxq_s32 (a, b, c);
+  return vmlaldavaxq_s32 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmlaldavax.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlaldavax.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
-foo1 (int64_t a, int32x4_t b, int32x4_t c)
+foo1 (int64_t add, int32x4_t m1, int32x4_t m2)
 {
-  return vmlaldavaxq (a, b, c);
+  return vmlaldavaxq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vmlaldavax.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 26/35] arm: improve tests for vmlasq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (24 preceding siblings ...)
  2022-11-17 16:37 ` [PATCH 25/35] arm: improve tests and fix vmlaldavaxq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 16:56   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 27/35] arm: improve tests for vqaddq_m* Andrea Corallo
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vmlasq_m_n_s16.c       | 34 ++++++++++---
 .../arm/mve/intrinsics/vmlasq_m_n_s32.c       | 34 ++++++++++---
 .../arm/mve/intrinsics/vmlasq_m_n_s8.c        | 34 ++++++++++---
 .../arm/mve/intrinsics/vmlasq_m_n_u16.c       | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vmlasq_m_n_u32.c       | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vmlasq_m_n_u8.c        | 50 ++++++++++++++++---
 .../arm/mve/intrinsics/vmlasq_n_s16.c         | 24 ++++++---
 .../arm/mve/intrinsics/vmlasq_n_s32.c         | 24 ++++++---
 .../arm/mve/intrinsics/vmlasq_n_s8.c          | 24 ++++++---
 .../arm/mve/intrinsics/vmlasq_n_u16.c         | 36 ++++++++++---
 .../arm/mve/intrinsics/vmlasq_n_u32.c         | 36 ++++++++++---
 .../arm/mve/intrinsics/vmlasq_n_u8.c          | 36 ++++++++++---
 12 files changed, 348 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
index bf66e616ec7..af6e588adad 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_s16 (a, b, c, p);
+  return vmlasq_m_n_s16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
index 53c21e2e5b6..9d0cc3076d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_s32 (a, b, c, p);
+  return vmlasq_m_n_s32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
index ac08b15fdbe..772ad8b1e76 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_s8 (a, b, c, p);
+  return vmlasq_m_n_s8 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
index 99f1e28c7d5..b02dc64a31b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint16x8_t a, uint16x8_t b, uint16_t c, mve_pred16_t p)
+foo (uint16x8_t m1, uint16x8_t m2, uint16_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_u16 (a, b, c, p);
+  return vmlasq_m_n_u16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint16x8_t a, uint16x8_t b, uint16_t c, mve_pred16_t p)
+foo1 (uint16x8_t m1, uint16x8_t m2, uint16_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
+{
+  return vmlasq_m (m1, m2, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
index 8d8edca6024..0214cf2136e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32x4_t a, uint32x4_t b, uint32_t c, mve_pred16_t p)
+foo (uint32x4_t m1, uint32x4_t m2, uint32_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_u32 (a, b, c, p);
+  return vmlasq_m_n_u32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32x4_t a, uint32x4_t b, uint32_t c, mve_pred16_t p)
+foo1 (uint32x4_t m1, uint32x4_t m2, uint32_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
+{
+  return vmlasq_m (m1, m2, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
index e7f685bbcaa..c9824e332f7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint8x16_t a, uint8x16_t b, uint8_t c, mve_pred16_t p)
+foo (uint8x16_t m1, uint8x16_t m2, uint8_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_u8 (a, b, c, p);
+  return vmlasq_m_n_u8 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint8x16_t a, uint8x16_t b, uint8_t c, mve_pred16_t p)
+foo1 (uint8x16_t m1, uint8x16_t m2, uint8_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
+{
+  return vmlasq_m (m1, m2, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
index 8bfe3c31096..6708a741790 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c)
+foo (int16x8_t m1, int16x8_t m2, int16_t add)
 {
-  return vmlasq_n_s16 (a, b, c);
+  return vmlasq_n_s16 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
index db06182abec..4e8bf32e016 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c)
+foo (int32x4_t m1, int32x4_t m2, int32_t add)
 {
-  return vmlasq_n_s32 (a, b, c);
+  return vmlasq_n_s32 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
index 3a151650ef4..1cb1a31459c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c)
+foo (int8x16_t m1, int8x16_t m2, int8_t add)
 {
-  return vmlasq_n_s8 (a, b, c);
+  return vmlasq_n_s8 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c)
+foo1 (int8x16_t m1, int8x16_t m2, int8_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
index b9444f2f6a3..e03c91ef298 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo (uint16x8_t a, uint16x8_t b, uint16_t c)
+foo (uint16x8_t m1, uint16x8_t m2, uint16_t add)
 {
-  return vmlasq_n_u16 (a, b, c);
+  return vmlasq_n_u16 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
-foo1 (uint16x8_t a, uint16x8_t b, uint16_t c)
+foo1 (uint16x8_t m1, uint16x8_t m2, uint16_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t m1, uint16x8_t m2)
+{
+  return vmlasq (m1, m2, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
index 5708a0658a6..b80c3c7631f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo (uint32x4_t a, uint32x4_t b, uint32_t c)
+foo (uint32x4_t m1, uint32x4_t m2, uint32_t add)
 {
-  return vmlasq_n_u32 (a, b, c);
+  return vmlasq_n_u32 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
-foo1 (uint32x4_t a, uint32x4_t b, uint32_t c)
+foo1 (uint32x4_t m1, uint32x4_t m2, uint32_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t m1, uint32x4_t m2)
+{
+  return vmlasq (m1, m2, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
index d83940c7232..0f37550160e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo (uint8x16_t a, uint8x16_t b, uint8_t c)
+foo (uint8x16_t m1, uint8x16_t m2, uint8_t add)
 {
-  return vmlasq_n_u8 (a, b, c);
+  return vmlasq_n_u8 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
-foo1 (uint8x16_t a, uint8x16_t b, uint8_t c)
+foo1 (uint8x16_t m1, uint8x16_t m2, uint8_t add)
 {
-  return vmlasq (a, b, c);
+  return vmlasq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vmlas.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t m1, uint8x16_t m2)
+{
+  return vmlasq (m1, m2, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 27/35] arm: improve tests for vqaddq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (25 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 26/35] arm: improve tests for vmlasq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 16:57   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 28/35] arm: improve tests for vqdmlahq_m* Andrea Corallo
                   ` (8 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vqaddq_m_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_s16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_s32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_s8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_u16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_u32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_m_u8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vqaddq_n_s16.c         | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_n_s32.c         | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_n_s8.c          | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_n_u16.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqaddq_n_u32.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqaddq_n_u8.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqaddq_s16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_s32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vqaddq_s8.c | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_u16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vqaddq_u32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vqaddq_u8.c | 16 ++++++-
 24 files changed, 516 insertions(+), 72 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
index 65d3f770fe2..a659373d441 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
index 4499a0eaa41..8ffc6a67762 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
index d3e1d555cb1..2e88b7fabac 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
index baadfe72e8d..61cf9fcf2aa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vqaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
index 80808777d9a..bbd255ac1f1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vqaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
index 32f2894422d..9cee8c65333 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vqaddq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vqaddq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
index d5b7fa63f6a..8bb8a957423 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqaddq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
index 015bc3eb206..9959724fc11 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqaddq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
index b241fddd069..6b918978880 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqaddq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
index fa752355d64..c0a8d9ba9c8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vqaddq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
index 0729b6bb30f..7a72ce57840 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vqaddq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
index f1541658399..f7e6ca9b5a4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vqaddq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vqaddq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
index 5eeda2bc2dd..0fac7abeac0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vqaddq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
index 5b914d18b98..d750b1f2c14 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vqaddq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
index 06f22c2b8df..5fc796edf75 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vqaddq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
index 5403f0b6646..decad65c188 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vqaddq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vqaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
index 77185808a16..b0a6d79093e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vqaddq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vqaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
index f0fa9bf3f5d..f9ca9a1f042 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vqaddq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vqaddq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
index 83cd3475a6f..ffa31463372 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vqaddq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
index d26dd206912..c5937a967ff 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vqaddq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
index de03264b4cc..9f937512811 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vqaddq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
index cd4efc1dd7c..aa4be43f244 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vqaddq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
index 8b3afb4bd04..daef60eb5ca 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vqaddq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
index da2ff1bb25c..e28807ec708 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vqaddq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vqaddq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqadd.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 28/35] arm: improve tests for vqdmlahq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (26 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 27/35] arm: improve tests for vqaddq_m* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 16:57   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 29/35] arm: improve tests for vqdmul* Andrea Corallo
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: Likewise.
---
 .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c     | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c     | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c      | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlahq_n_s16.c       | 24 +++++++++----
 .../arm/mve/intrinsics/vqdmlahq_n_s32.c       | 24 +++++++++----
 .../arm/mve/intrinsics/vqdmlahq_n_s8.c        | 24 +++++++++----
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c    | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c    | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c     | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c      | 24 +++++++++----
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c      | 24 +++++++++----
 .../arm/mve/intrinsics/vqdmlashq_n_s8.c       | 24 +++++++++----
 12 files changed, 264 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
index d8c4f4bab8e..94d93874542 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m_n_s16 (a, b, c, p);
+  return vqdmlahq_m_n_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m (a, b, c, p);
+  return vqdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
index 361f5d00bdf..a3dab7fa02e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m_n_s32 (a, b, c, p);
+  return vqdmlahq_m_n_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m (a, b, c, p);
+  return vqdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
index a9eaea89ba4..610580478a3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m_n_s8 (a, b, c, p);
+  return vqdmlahq_m_n_s8 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo1 (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m (a, b, c, p);
+  return vqdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
index c109dd47444..210bacec2fb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c)
+foo (int16x8_t add, int16x8_t m1, int16_t m2)
 {
-  return vqdmlahq_n_s16 (a, b, c);
+  return vqdmlahq_n_s16 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c)
+foo1 (int16x8_t add, int16x8_t m1, int16_t m2)
 {
-  return vqdmlahq (a, b, c);
+  return vqdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
index 752d9d9e3e0..dbb2494b216 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c)
+foo (int32x4_t add, int32x4_t m1, int32_t m2)
 {
-  return vqdmlahq_n_s32 (a, b, c);
+  return vqdmlahq_n_s32 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c)
+foo1 (int32x4_t add, int32x4_t m1, int32_t m2)
 {
-  return vqdmlahq (a, b, c);
+  return vqdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
index 8dffa0e1852..a7962f82d38 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c)
+foo (int8x16_t add, int8x16_t m1, int8_t m2)
 {
-  return vqdmlahq_n_s8 (a, b, c);
+  return vqdmlahq_n_s8 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c)
+foo1 (int8x16_t add, int8x16_t m1, int8_t m2)
 {
-  return vqdmlahq (a, b, c);
+  return vqdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqdmlah.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
index 7c2e5cf89dd..34d407f0142 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m_n_s16 (a, b, c, p);
+  return vqdmlashq_m_n_s16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m (a, b, c, p);
+  return vqdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
index cea9d9b683f..50a665ea7e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m_n_s32 (a, b, c, p);
+  return vqdmlashq_m_n_s32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m (a, b, c, p);
+  return vqdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
index 83ee258876a..45f34b60382 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m_n_s8 (a, b, c, p);
+  return vqdmlashq_m_n_s8 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vqdmlashq_m (a, b, c, p);
+  return vqdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlasht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
index c71a61c54f6..a3f1ae8d6b8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlash.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c)
+foo (int16x8_t m1, int16x8_t m2, int16_t add)
 {
-  return vqdmlashq_n_s16 (a, b, c);
+  return vqdmlashq_n_s16 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlash.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add)
 {
-  return vqdmlashq (a, b, c);
+  return vqdmlashq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
index 61f6c6671cc..cf867e56874 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlash.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c)
+foo (int32x4_t m1, int32x4_t m2, int32_t add)
 {
-  return vqdmlashq_n_s32 (a, b, c);
+  return vqdmlashq_n_s32 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlash.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add)
 {
-  return vqdmlashq (a, b, c);
+  return vqdmlashq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
index a07892863c1..7e9362cab60 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmlash.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c)
+foo (int8x16_t m1, int8x16_t m2, int8_t add)
 {
-  return vqdmlashq_n_s8 (a, b, c);
+  return vqdmlashq_n_s8 (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmlash.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c)
+foo1 (int8x16_t m1, int8x16_t m2, int8_t add)
 {
-  return vqdmlashq (a, b, c);
+  return vqdmlashq (m1, m2, add);
 }
 
-/* { dg-final { scan-assembler "vqdmlash.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 29/35] arm: improve tests for vqdmul*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (27 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 28/35] arm: improve tests for vqdmlahq_m* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 16:58   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 30/35] arm: improve tests for vqrdmlahq* Andrea Corallo
                   ` (6 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.
---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c     | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c     | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c      | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_m_s16.c       | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_m_s32.c       | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_m_s8.c        | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulhq_n_s16.c       | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulhq_n_s32.c       | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulhq_n_s8.c        | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulhq_s16.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulhq_s32.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulhq_s8.c          | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c    | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c    | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmullbq_m_s16.c      | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmullbq_m_s32.c      | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmullbq_n_s16.c      | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmullbq_n_s32.c      | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmullbq_s16.c        | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmullbq_s32.c        | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c    | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c    | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulltq_m_s16.c      | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulltq_m_s32.c      | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vqdmulltq_n_s16.c      | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulltq_n_s32.c      | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulltq_s16.c        | 16 ++++++++++--
 .../arm/mve/intrinsics/vqdmulltq_s32.c        | 16 ++++++++++--
 28 files changed, 504 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
index 57ab85eaf52..a5c1a106205 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
index 256353a0a21..c78d4db1591 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
index c24be9ed5ad..b5ab6eb292c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
index 49efeefcf63..2f5fb0e53a4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
index a5614830622..80a938a8a5b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
index 2e016f57e35..bfb755af4ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulht.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqdmulhq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
index 19534b60b27..e34689d203d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vqdmulhq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
index eff9f6ecc4b..f967b8a286a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vqdmulhq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
index 188cf7c616f..5e1928fd51b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vqdmulhq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
index 513a30f67e6..7c0a434e48f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vqdmulhq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
index 9cf147dc7c5..19f4b03f6f0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vqdmulhq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
index 87211ad054a..1784c967f3c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmulh.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vqdmulhq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmulh.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vqdmulhq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
index f0a4ad5b9f4..4f96e192732 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmullbq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmullbq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
index 1c7b2e4a1fc..d0bca6e3015 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmullbq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmullbq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
index 6a056cf86a1..8448cdc88cf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmullbq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmullbq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
index 019c536e7f2..48cddcd791e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmullbq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmullbt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmullbq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
index ec501c34539..cd7c394139d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullb.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int16x8_t a, int16_t b)
 {
   return vqdmullbq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullb.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vqdmullbq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
index 78fe3d6b289..b4d82f55987 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullb.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int32x4_t a, int32_t b)
 {
   return vqdmullbq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullb.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vqdmullbq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
index 9a423d3cc66..6f0fdabf67f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullb.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vqdmullbq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullb.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vqdmullbq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
index f0278cd8a86..2bf952bfd77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullb.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vqdmullbq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullb.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vqdmullbq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
index 85f03149da4..6c756ebf3e7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmulltq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmulltq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
index 6bb5004e201..e46f6b2c384 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmulltq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqdmulltq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
index a85393b5bc1..8526b3ad628 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmulltq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqdmulltq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
index 82f25b2ebbe..809e0740e46 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmulltq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqdmulltt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqdmulltq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
index f9ad32a8411..44f0036bc51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int16x8_t a, int16_t b)
 {
   return vqdmulltq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vqdmulltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
index 311b023431e..b025886ff15 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int32x4_t a, int32_t b)
 {
   return vqdmulltq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vqdmulltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
index 851f27a63b6..95084876349 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vqdmulltq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vqdmulltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
index 1e81cc3dea5..ab27aeddc29 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqdmullt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vqdmulltq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqdmullt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vqdmulltq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 30/35] arm: improve tests for vqrdmlahq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (28 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 29/35] arm: improve tests for vqdmul* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:01   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 31/35] arm: improve tests for vqrdmlashq_m* Andrea Corallo
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c: Likewise.
---
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s16.c    | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s32.c    | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqrdmlahq_m_n_s8.c     | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqrdmlahq_n_s16.c      | 24 +++++++++----
 .../arm/mve/intrinsics/vqrdmlahq_n_s32.c      | 24 +++++++++----
 .../arm/mve/intrinsics/vqrdmlahq_n_s8.c       | 24 +++++++++----
 6 files changed, 132 insertions(+), 42 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
index 70c3fa0e9b1..07d689279ac 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m_n_s16 (a, b, c, p);
+  return vqrdmlahq_m_n_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m (a, b, c, p);
+  return vqrdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
index 75ed9911276..3b02ca16038 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m_n_s32 (a, b, c, p);
+  return vqrdmlahq_m_n_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m (a, b, c, p);
+  return vqrdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
index ddaea545f40..b661bdcb4cf 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m_n_s8 (a, b, c, p);
+  return vqrdmlahq_m_n_s8 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo1 (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
 {
-  return vqrdmlahq_m (a, b, c, p);
+  return vqrdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlaht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
index 45e74971838..16804735b32 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqrdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c)
+foo (int16x8_t add, int16x8_t m1, int16_t m2)
 {
-  return vqrdmlahq_n_s16 (a, b, c);
+  return vqrdmlahq_n_s16 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqrdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c)
+foo1 (int16x8_t add, int16x8_t m1, int16_t m2)
 {
-  return vqrdmlahq (a, b, c);
+  return vqrdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
index 79bb9c98b12..d7d3dc06d7f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqrdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c)
+foo (int32x4_t add, int32x4_t m1, int32_t m2)
 {
-  return vqrdmlahq_n_s32 (a, b, c);
+  return vqrdmlahq_n_s32 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqrdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c)
+foo1 (int32x4_t add, int32x4_t m1, int32_t m2)
 {
-  return vqrdmlahq (a, b, c);
+  return vqrdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
index 220518ae698..d3f9f25f11c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqrdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c)
+foo (int8x16_t add, int8x16_t m1, int8_t m2)
 {
-  return vqrdmlahq_n_s8 (a, b, c);
+  return vqrdmlahq_n_s8 (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqrdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c)
+foo1 (int8x16_t add, int8x16_t m1, int8_t m2)
 {
-  return vqrdmlahq (a, b, c);
+  return vqrdmlahq (add, m1, m2);
 }
 
-/* { dg-final { scan-assembler "vqrdmlah.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 31/35] arm: improve tests for vqrdmlashq_m*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (29 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 30/35] arm: improve tests for vqrdmlahq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:02   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 32/35] arm: improve tests for vqsubq* Andrea Corallo
                   ` (4 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c:
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c:
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c:
---
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s16.c   | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s32.c   | 34 ++++++++++++++-----
 .../arm/mve/intrinsics/vqrdmlashq_m_n_s8.c    | 34 ++++++++++++++-----
 3 files changed, 78 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
index 35b9618ca47..da4d724bb46 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m_n_s16 (a, b, c, p);
+  return vqrdmlashq_m_n_s16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m (a, b, c, p);
+  return vqrdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
index 8517835eb61..2430f1cb102 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m_n_s32 (a, b, c, p);
+  return vqrdmlashq_m_n_s32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m (a, b, c, p);
+  return vqrdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
index e42cc63fa74..30915b24e5e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m_n_s8 (a, b, c, p);
+  return vqrdmlashq_m_n_s8 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqrdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
-foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
+foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
 {
-  return vqrdmlashq_m (a, b, c, p);
+  return vqrdmlashq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqrdmlasht.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 32/35] arm: improve tests for vqsubq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (30 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 31/35] arm: improve tests for vqrdmlashq_m* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:03   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq* Andrea Corallo
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_s16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_s32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_s8.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_u16.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_u32.c:
	* gcc.target/arm/mve/intrinsics/vqsubq_u8.c:
---
 .../arm/mve/intrinsics/vqsubq_m_n_s16.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_n_s32.c       | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_n_s8.c        | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_n_u16.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_n_u32.c       | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_n_u8.c        | 42 +++++++++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_s16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_s32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_s8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_u16.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_u32.c         | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_m_u8.c          | 26 ++++++++++--
 .../arm/mve/intrinsics/vqsubq_n_s16.c         | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_n_s32.c         | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_n_s8.c          | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_n_u16.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqsubq_n_u32.c         | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqsubq_n_u8.c          | 28 ++++++++++++-
 .../arm/mve/intrinsics/vqsubq_s16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_s32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_u16.c           | 16 ++++++-
 .../arm/mve/intrinsics/vqsubq_u32.c           | 16 ++++++-
 .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c | 16 ++++++-
 24 files changed, 516 insertions(+), 72 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
index abcff4f0e3c..39b8089919d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
index 23e59ff12a2..ed6b92ddcf5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
index d783ab55f65..c69ed2aeb84 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
index 5244efb340c..57ba7428bef 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vqsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
index 4427f87f456..eda9e74309d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vqsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
index 0abfa5dc132..f6f61b52f52 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
@@ -1,23 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
+{
+  return vqsubq_m (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
index faa189f8466..1a8ea29e83e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqsubq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
index 62a4dd0979f..c49b7497f6d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqsubq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
index 71fb6f5632e..17d6471bcd9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqsubq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
index 68d642dfef5..0ce93fdf9be 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vqsubq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
index 8f76c5f47da..1eac57545b3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vqsubq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
index af335ae9752..56bdda2da6e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vqsubq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vqsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
index 33a79180289..b9a46f5ff6f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16_t b)
 {
   return vqsubq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
index a2b338839fa..732e6c01b78 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vqsubq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
index e8d7e99d19d..fb3c4404fba 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8_t b)
 {
   return vqsubq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
index f7b48c546a6..aa09d1831e0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16_t b)
 {
   return vqsubq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u16"  }  } */
+/*
+**foo2:
+**	...
+**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t a)
+{
+  return vqsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
index f74a968f5a7..19b62e3a8a5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32_t b)
 {
   return vqsubq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t a)
+{
+  return vqsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
index ce7b4ce0151..c8eeb38b266 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
@@ -1,21 +1,45 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8_t b)
 {
   return vqsubq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u8"  }  } */
+/*
+**foo2:
+**	...
+**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t a)
+{
+  return vqsubq (a, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
index 85bf265eeb0..6c66b4d75d8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vqsubq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
index 35d17e8bc4e..8432197b9e8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vqsubq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
index 50cfccff7a5..ad16cae08bc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vqsubq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
index 15f0b7244b7..264df1a0398 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, uint16x8_t b)
 {
   return vqsubq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
index 7d695e23474..a4bf15cd9df 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, uint32x4_t b)
 {
   return vqsubq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
index c0552d100d4..1804d6484e2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vqsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, uint8x16_t b)
 {
   return vqsubq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vqsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
   return vqsubq (a, b);
 }
 
-/* { dg-final { scan-assembler "vqsub.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (31 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 32/35] arm: improve tests for vqsubq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:03   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 34/35] arm: improve tests for vrshlq* Andrea Corallo
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/ChangeLog:

	* config/arm/mve.md (mve_vrmlaldavhq_<supf>v4si,
	mve_vrmlaldavhaq_<supf>v4si): Fix spacing vs tabs.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c: Likewise.
---
 gcc/config/arm/mve.md                         |  4 +-
 .../arm/mve/intrinsics/vrmlaldavhaq_p_s32.c   | 24 ++++++++++-
 .../arm/mve/intrinsics/vrmlaldavhaq_p_u32.c   | 40 ++++++++++++++++++-
 3 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index d2ffae6a425..b5e6da4b133 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2543,7 +2543,7 @@ (define_insn "mve_vrmlaldavhq_<supf>v4si"
 	 VRMLALDAVHQ))
   ]
   "TARGET_HAVE_MVE"
-  "vrmlaldavh.<supf>32 %Q0, %R0, %q1, %q2"
+  "vrmlaldavh.<supf>32\t%Q0, %R0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2649,7 +2649,7 @@ (define_insn "mve_vrmlaldavhaq_<supf>v4si"
 	 VRMLALDAVHAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vrmlaldavha.<supf>32 %Q0, %R0, %q2, %q3"
+  "vrmlaldavha.<supf>32\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
index 263d3509771..dec4a969dfe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrmlaldavhat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
 {
   return vrmlaldavhaq_p_s32 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrmlaldavhat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int64_t
 foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
 {
   return vrmlaldavhaq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
index 83ab68c001b..f3c8bfd121c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
@@ -1,21 +1,57 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
 {
   return vrmlaldavhaq_p_u32 (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint64_t
 foo1 (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
 {
   return vrmlaldavhaq_p (a, b, c, p);
 }
 
-/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
+/*
+**foo2:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
+uint64_t
+foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+{
+  return vrmlaldavhaq_p (1, b, c, p);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 34/35] arm: improve tests for vrshlq*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (32 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:04   ` Kyrylo Tkachov
  2022-11-17 16:38 ` [PATCH 35/35] arm: improve tests for vsetq_lane* Andrea Corallo
  2022-11-28  9:20 ` [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Improve tests.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vrshlq_m_n_s16.c       | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_n_s32.c       | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_n_s8.c        | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_n_u16.c       | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_n_u32.c       | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_n_u8.c        | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_s16.c         | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_s32.c         | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_s8.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_u16.c         | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_u32.c         | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_m_u8.c          | 26 ++++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_n_s16.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_n_s32.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_n_s8.c          | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_n_u16.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_n_u32.c         | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_n_u8.c          | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_s16.c           | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_s32.c           | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_u16.c           | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_u32.c           | 16 ++++++++++--
 .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c | 16 ++++++++++--
 .../arm/mve/intrinsics/vrshlq_x_s16.c         | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_x_s32.c         | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_x_s8.c          | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_x_u16.c         | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_x_u32.c         | 25 +++++++++++++++---
 .../arm/mve/intrinsics/vrshlq_x_u8.c          | 25 +++++++++++++++---
 30 files changed, 564 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
index cf51de6aa9c..c7d1f3a5b1c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
index dcfd99773e3..a8713e6a06a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
index cc1b746dc0d..8160d1bdb04 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
index 93a95ba9065..b08f4c076d1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
index 4b8c82aba21..59f9a13d8c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
index f1ff9dd33b7..fda65f7c592 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
index 57f343cd3b9..20c9f5fcd7c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_m_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
index 2598b1719fd..af7a5158458 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_m_s32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
index 6e4f1bdddf4..59d283ebb71 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_m_s8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
index d4d98913b75..e731cb71675 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_m_u16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
index 5d60f1fe799..0379e0455c9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_m_u32 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
index 913ba36c925..1e20486253e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_m_u8 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
index 713c6a218b2..c846e9f06ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int32_t b)
 {
   return vrshlq_n_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
index 18906fe44d1..1c6144212f7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32_t b)
 {
   return vrshlq_n_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
index d5b1286d943..3b9d0a389dc 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int32_t b)
 {
   return vrshlq_n_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
index 49bb21663d7..77994bd3a29 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int32_t b)
 {
   return vrshlq_n_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
index 8ed67395b42..82774c794fe 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32_t b)
 {
   return vrshlq_n_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
index ccc6a00b98a..e9badb7297e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int32_t b)
 {
   return vrshlq_n_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int32_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
index c28ad31c6f9..4a64fc7b410 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b)
 {
   return vrshlq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
index 2e279b6fb0a..c5cbe266c0f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b)
 {
   return vrshlq_s32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
index 4d18419d1bf..85305921f9a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b)
 {
   return vrshlq_s8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.s8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
index e0a9ea9cebc..905a18c4f20 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b)
 {
   return vrshlq_u16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
index 788a4b1b6fa..16c7578df39 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b)
 {
   return vrshlq_u32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u32"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
index d860e9cccb9..8bf21eeaef5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
@@ -1,21 +1,33 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vrshl.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b)
 {
   return vrshlq_u8 (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vrshl.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b)
 {
   return vrshlq (a, b);
 }
 
-/* { dg-final { scan-assembler "vrshl.u8"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
index 800a1e8e48f..4dfb6a65842 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_x_s16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
index 921072a44c9..7f1f6dbb760 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_x_s32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
index 217b257ed24..69bf0a50fa6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_x_s8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
index 5c0cad9ec89..b5a89892070 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_x_u16 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
index 2754d20841c..59ab2662021 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_x_u32 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
index 46dada44559..b81d8d03da4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_x_u8 (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+**	vpst(?:	@.*|)
+**	...
+**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
 {
   return vrshlq_x (a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 35/35] arm: improve tests for vsetq_lane*
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (33 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 34/35] arm: improve tests for vrshlq* Andrea Corallo
@ 2022-11-17 16:38 ` Andrea Corallo
  2022-11-22 17:06   ` Kyrylo Tkachov
  2022-11-28  9:20 ` [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
  35 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-17 16:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vsetq_lane_f16.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_f32.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s16.c       | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s32.c       | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s64.c       | 27 ++++++++++---
 .../arm/mve/intrinsics/vsetq_lane_s8.c        | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u16.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u32.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u64.c       | 39 ++++++++++++++++---
 .../arm/mve/intrinsics/vsetq_lane_u8.c        | 36 +++++++++++++++--
 10 files changed, 284 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
index e03e9620528..b5c9f4d5eb8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16_t a, float16x8_t b)
 {
-    return vsetq_lane_f16 (a, b, 0);
+  return vsetq_lane_f16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (float16_t a, float16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
index 2b9f1a7e627..211083ce5d4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32_t a, float32x4_t b)
 {
-    return vsetq_lane_f32 (a, b, 0);
+  return vsetq_lane_f32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (float32_t a, float32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
index 92ad0dd16a8..9cdaeae1e74 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16_t a, int16x8_t b)
 {
-    return vsetq_lane_s16 (a, b, 0);
+  return vsetq_lane_s16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int16x8_t
+foo1 (int16_t a, int16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
index e60c8f26700..edd06bce1bd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32_t a, int32x4_t b)
 {
-    return vsetq_lane_s32 (a, b, 0);
+  return vsetq_lane_s32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int32x4_t
+foo1 (int32_t a, int32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
index 430df669f2a..95ba4da1f51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
@@ -1,16 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
-/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-mfloat-abi=hard -O2" } */
+/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64_t a, int64x2_t b)
 {
-    return vsetq_lane_s64 (a, b, 0);
+  return vsetq_lane_s64 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int64x2_t
+foo1 (int64_t a, int64x2_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
index d8ccbb524fd..f5bf0dd663b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8_t a, int8x16_t b)
 {
-    return vsetq_lane_s8 (a, b, 0);
+  return vsetq_lane_s8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int8x16_t
+foo1 (int8_t a, int8x16_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
index 156a5d1de1b..33944dcbd45 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16_t a, uint16x8_t b)
 {
-    return vsetq_lane_u16 (a, b, 0);
+  return vsetq_lane_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint16_t a, uint16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
index e9575483cc9..8f9a3a78cc5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32x4_t b)
 {
-    return vsetq_lane_u32 (a, b, 0);
+  return vsetq_lane_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32_t a, uint32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
index 0e040121cf0..5ce4c544c25 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
@@ -1,16 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
-/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-mfloat-abi=hard -O2" } */
+/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint64x2_t
 foo (uint64_t a, uint64x2_t b)
 {
-    return vsetq_lane_u64 (a, b, 0);
+  return vsetq_lane_u64 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint64x2_t
+foo1 (uint64_t a, uint64x2_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint64x2_t
+foo2 (uint64x2_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
index 668b3fea953..d37021c91b0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8_t a, uint8x16_t b)
 {
-    return vsetq_lane_u8 (a, b, 0);
+  return vsetq_lane_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint8_t a, uint8x16_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 01/35] arm: improve vcreateq* tests
  2022-11-17 16:37 ` [PATCH 01/35] arm: improve vcreateq* tests Andrea Corallo
@ 2022-11-18  9:47   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18  9:47 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 01/35] arm: improve vcreateq* tests
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise.
> ---
>  .../arm/mve/intrinsics/vcreateq_f16.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_f32.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_s16.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_s32.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_s64.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_s8.c          | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_u16.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_u32.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_u64.c         | 23 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vcreateq_u8.c          | 23 ++++++++++++++++++-
>  10 files changed, 220 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> index fb3601edb94..c39303daa03 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...

Eventually I'd like to see these tests tightened to match more specific codegen for the tests that have only one intrinsic call in their body, but I appreciate the codegen for many of these is still immature and there are softfp/hard ABI differences as well.
This patch is definitely an improvement over what's there now though, so ok.
Thanks,
Kyrill

> +*/
>  float16x8_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +float16x8_t
> +foo1 ()
> +{
> +  return vcreateq_f16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> index 4f4da62eed7..ad66f4407cd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  float32x4_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +float32x4_t
> +foo1 ()
> +{
> +  return vcreateq_f32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> index 103be6310bd..7e70a486513 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  int16x8_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +int16x8_t
> +foo1 ()
> +{
> +  return vcreateq_s16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
> index 96f7a972d93..ffcfc80ff40 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  int32x4_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +int32x4_t
> +foo1 ()
> +{
> +  return vcreateq_s32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
> index 74c554506c0..26642f9cd68 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  int64x2_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_s64 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +int64x2_t
> +foo1 ()
> +{
> +  return vcreateq_s64 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
> index 03c50a0928a..7e7e4d5948d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  int8x16_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +int8x16_t
> +foo1 ()
> +{
> +  return vcreateq_s8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
> index 411cec8471e..858a3a4546f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  uint16x8_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +uint16x8_t
> +foo1 ()
> +{
> +  return vcreateq_u16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
> index 8bc8f60640e..5f27cf68845 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  uint32x4_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +uint32x4_t
> +foo1 ()
> +{
> +  return vcreateq_u32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
> index e74641c32f3..78553dec701 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  uint64x2_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_u64 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +uint64x2_t
> +foo1 ()
> +{
> +  return vcreateq_u64 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
> index de79f471d63..4a8ab61f865 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
>  uint8x16_t
>  foo (uint64_t a, uint64_t b)
>  {
>    return vcreateq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**	vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**	...
> +*/
> +uint8x16_t
> +foo1 ()
> +{
> +  return vcreateq_u8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization
  2022-11-17 16:37 ` [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization Andrea Corallo
@ 2022-11-18 16:33   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:33 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization
> 
> gcc/ChangeLog:
> 
> 	* config/arm/vfp.md (*thumb2_movhi_vfp, *thumb2_movhi_fp16):
> Fix
> 	'vmsr' spacing and reg capitalization.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
> 	Update test.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c:
> 	Likewise.
> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c:
> 	Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/vfp.md                                     | 8 ++++----
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c      | 2 +-
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c      | 2 +-
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c      | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index d0f423cc3c5..932e4b7447e 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -105,9 +105,9 @@ (define_insn "*thumb2_movhi_vfp"
>      case 8:
>        return "vmov%?.f32\t%0, %1\t%@ int";
>      case 9:
> -      return "vmsr%?\t P0, %1\t@ movhi";
> +      return "vmsr%?\tp0, %1\t@ movhi";
>      case 10:
> -      return "vmrs%?\t %0, P0\t@ movhi";
> +      return "vmrs%?\t%0, p0\t@ movhi";
>      default:
>        gcc_unreachable ();
>      }
> @@ -209,9 +209,9 @@ (define_insn "*thumb2_movhi_fp16"
>      case 8:
>        return "vmov%?.f32\t%0, %1\t%@ int";
>      case 9:
> -      return "vmsr%?\t P0, %1\t%@ movhi";
> +      return "vmsr%?\tp0, %1\t%@ movhi";
>      case 10:
> -      return "vmrs%?\t%0, P0\t%@ movhi";
> +      return "vmrs%?\t%0, p0\t%@ movhi";
>      default:
>        gcc_unreachable ();
>      }
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> index f3219e2e825..1e57ca40739 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> index 4d093d243fe..f8d77fdfd5b 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> index e796522a49c..8a0e109c70c 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 03/35] arm: improve tests and fix vddupq*
  2022-11-17 16:37 ` [PATCH 03/35] arm: improve tests and fix vddupq* Andrea Corallo
@ 2022-11-18 16:34   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:34 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 03/35] arm: improve tests and fix vddupq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vddupq_u<mode>_insn): Fix 'vddup.u'
> 	spacing.
> 	(mve_vddupq_m_wb_u<mode>_insn): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_n_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c : Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  4 +-
>  .../arm/mve/intrinsics/vddupq_m_n_u16.c       | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vddupq_m_n_u32.c       | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_m_n_u8.c        | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_m_wb_u16.c      | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vddupq_m_wb_u32.c      | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_m_wb_u8.c       | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_n_u16.c         | 32 ++++++++++--
>  .../arm/mve/intrinsics/vddupq_n_u32.c         | 28 +++++++++-
>  .../arm/mve/intrinsics/vddupq_n_u8.c          | 28 +++++++++-
>  .../arm/mve/intrinsics/vddupq_wb_u16.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/vddupq_wb_u32.c        | 28 +++++++++-
>  .../arm/mve/intrinsics/vddupq_wb_u8.c         | 28 +++++++++-
>  .../arm/mve/intrinsics/vddupq_x_n_u16.c       | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vddupq_x_n_u32.c       | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_x_n_u8.c        | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vddupq_x_wb_u16.c      | 52 +++++++++++++++----
>  .../arm/mve/intrinsics/vddupq_x_wb_u32.c      | 52 +++++++++++++++----
>  .../arm/mve/intrinsics/vddupq_x_wb_u8.c       | 52 +++++++++++++++----
>  19 files changed, 642 insertions(+), 96 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 62186f124da..1215f845388 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9043,7 +9043,7 @@ (define_insn "mve_vddupq_u<mode>_insn"
>         (minus:SI (match_dup 2)
>  		 (match_operand:SI 4 "immediate_operand" "i")))]
>   "TARGET_HAVE_MVE"
> - "vddup.u%#<V_sz_elem>  %q0, %1, %3")
> + "vddup.u%#<V_sz_elem>\t%q0, %1, %3")
> 
>  ;;
>  ;; [vddupq_m_n_u])
> @@ -9079,7 +9079,7 @@ (define_insn
> "mve_vddupq_m_wb_u<mode>_insn"
>         (minus:SI (match_dup 3)
>  		 (match_operand:SI 6 "immediate_operand" "i")))]
>   "TARGET_HAVE_MVE"
> - "vpst\;\tvddupt.u%#<V_sz_elem>\t%q0, %2, %4"
> + "vpst\;vddupt.u%#<V_sz_elem>\t%q0, %2, %4"
>   [(set_attr "length""8")])
> 
>  ;;
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> index 7332711f6a7..7c8b0152763 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vddupq_m_n_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
> index 54ad91f2803..810a1a7e21b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_m_n_u32 (inactive, a, 4, p);
> +  return vddupq_m_n_u32 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_m (inactive, a, 4, p);
> +  return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
> index 3746b5db6e5..6642b9f4b88 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_m_n_u8 (inactive, a, 4, p);
> +  return vddupq_m_n_u8 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_m (inactive, a, 4, p);
> +  return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
> index 8b5d9e86469..cc6a19516d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vddupq_m_wb_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
> index 7a8c363ac70..cd6c6f86eea 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_m_wb_u32 (inactive, a, 4, p);
> +  return vddupq_m_wb_u32 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_m (inactive, a, 4, p);
> +  return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
> index 45784a5c9cd..fe186e743da 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_m_wb_u8 (inactive, a, 4, p);
> +  return vddupq_m_wb_u8 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_m (inactive, a, 4, p);
> +  return vddupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vddupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
> index 4684e2af553..2dba2d74b61 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a)
>  {
> -  return vddupq_n_u16 (a, 4);
> +  return vddupq_n_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a)
>  {
> -  return vddupq_u16 (a, 4);
> +  return vddupq_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vddupq_u16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
> index aeaa83eb6bc..6b5cf6c75b0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a)
>  {
>    return vddupq_n_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a)
>  {
>    return vddupq_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vddupq_u32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
> index 255a9f80b6b..174e422f4ef 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a)
>  {
>    return vddupq_n_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a)
>  {
>    return vddupq_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vddupq_u8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
> index 40fc6cf2197..6a471a7f72f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t *a)
>  {
> -  return vddupq_wb_u16 (a, 4);
> +  return vddupq_wb_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t *a)
>  {
> -  return vddupq_u16 (a, 4);
> +  return vddupq_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vddupq_u16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
> index 09b5b1f2f80..debf420d3e8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t *a)
>  {
>    return vddupq_wb_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t *a)
>  {
>    return vddupq_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vddupq_u32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
> index 00dfa906748..8e6ef8adccd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t *a)
>  {
>    return vddupq_wb_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t *a)
>  {
>    return vddupq_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vddup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vddup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vddupq_u8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
> index 5b0fc0b6340..1aafaf87b82 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
>    return vddupq_x_n_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
>    return vddupq_x_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vddupq_x_u16 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
> index 66def991b65..2e3e268dbee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_x_n_u32 (a, 4, p);
> +  return vddupq_x_n_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_x_u32 (a, 4, p);
> +  return vddupq_x_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vddupq_x_u32 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
> index 8ac322ed52d..bdf563a8074 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_x_n_u8 (a, 4, p);
> +  return vddupq_x_n_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
> -  return vddupq_x_u8 (a, 4, p);
> +  return vddupq_x_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vddupq_x_u8 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
> index 030048f840a..713d8b731c8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t *a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_x_wb_u16 (a, 2, p);
> +  return vddupq_x_wb_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vddupq_x_u16 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vddupq_x_u16 (a, 2, p);
> +  return vddupq_x_u16 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
> index 95bf28e4052..9f484b3b8fb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t *a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_x_wb_u32 (a, 8, p);
> +  return vddupq_x_wb_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vddupq_x_u32 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vddupq_x_u32 (a, 8, p);
> +  return vddupq_x_u32 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
> index 2fe81dded55..aa83bfed125 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t *a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vddupq_x_wb_u8 (a, 8, p);
> +  return vddupq_x_wb_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vddupq_x_u8 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vddupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vddupq_x_u8 (a, 8, p);
> +  return vddupq_x_u8 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 04/35] arm: improve tests and fix vdwdupq*
  2022-11-17 16:37 ` [PATCH 04/35] arm: improve tests and fix vdwdupq* Andrea Corallo
@ 2022-11-18 16:35   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:35 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 04/35] arm: improve tests and fix vdwdupq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vdwdupq_m_wb_u<mode>_insn): Fix
> spacing.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c : Improve test.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c : Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c : Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  2 +-
>  .../arm/mve/intrinsics/vdwdupq_m_n_u16.c      | 44 ++++++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_m_n_u32.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_m_n_u8.c       | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u16.c     | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u32.c     | 48 +++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u8.c      | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_n_u16.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_n_u32.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_n_u8.c         | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u16.c       | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u32.c       | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u8.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_x_n_u16.c      | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vdwdupq_x_n_u32.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_x_n_u8.c       | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u16.c     | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u32.c     | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u8.c      | 50 ++++++++++++++++---
>  19 files changed, 655 insertions(+), 103 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 1215f845388..58ffe03c499 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9195,7 +9195,7 @@ (define_insn
> "mve_vdwdupq_m_wb_u<mode>_insn"
>  	 VDWDUPQ_M))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;\tvdwdupt.u%#<V_sz_elem>\t%q2, %3, %R4, %5"
> +  "vpst\;vdwdupt.u%#<V_sz_elem>\t%q2, %3, %R4, %5"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> index 5303fd7d361..8f53f5ef0cb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 1, p);
> +  return vdwdupq_m_n_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
> index 9f22bd7f852..30e971fb733 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 4, p);
> +  return vdwdupq_m_n_u32 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 4, p);
> +  return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
> index 0591e731958..0abc19a2318 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 4, p);
> +  return vdwdupq_m_n_u8 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 4, p);
> +  return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
> index e4e7b47e082..b3e6affbf8f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint16x8_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 8, p);
> +  return vdwdupq_m_wb_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint16x8_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 8, p);
> +  return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
> index 42917dc9886..60c52b0d850 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32x4_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 1, p);
> +  return vdwdupq_m_wb_u32 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32x4_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
> index 32c3153ffb3..459321a7984 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint8x16_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 2, p);
> +  return vdwdupq_m_wb_u8 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint8x16_t inactive, uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 2, p);
> +  return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vdwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
> index 725a6e4bc0e..9f76dbf35eb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_n_u16 (a, b, 2);
> +  return vdwdupq_n_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_u16 (a, b, 2);
> +  return vdwdupq_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vdwdupq_u16 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
> index 6ceaadb984d..962f766b496 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_n_u32 (a, b, 8);
> +  return vdwdupq_n_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_u32 (a, b, 8);
> +  return vdwdupq_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vdwdupq_u32 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
> index a1712e418be..c73b1b69661 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_n_u8 (a, b, 4);
> +  return vdwdupq_n_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, uint32_t b)
>  {
> -  return vdwdupq_u8 (a, b, 4);
> +  return vdwdupq_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vdwdupq_u8 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
> index 0164ea9502c..3b1968d78aa 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_wb_u16 (a, b, 2);
> +  return vdwdupq_wb_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_u16 (a, b, 2);
> +  return vdwdupq_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vdwdupq_u16 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
> index 7681371b016..8554f62ee6b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_wb_u32 (a, b, 8);
> +  return vdwdupq_wb_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_u32 (a, b, 8);
> +  return vdwdupq_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vdwdupq_u32 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
> index 6f60bb09b24..eb91a80daf5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_wb_u8 (a, b, 4);
> +  return vdwdupq_wb_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t *a, uint32_t b)
>  {
> -  return vdwdupq_u8 (a, b, 4);
> +  return vdwdupq_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vdwdup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vdwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vdwdupq_u8 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
> index ce975267531..9c0fd1e253c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_x_n_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_x_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u16 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
> index 9ed75d292d8..3107e2fdbbe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_n_u32 (a, b, 4, p);
> +  return vdwdupq_x_n_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_u32 (a, b, 4, p);
> +  return vdwdupq_x_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u32 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
> index 3705094c4df..03d01e0dd43 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_n_u8 (a, b, 4, p);
> +  return vdwdupq_x_n_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_u8 (a, b, 4, p);
> +  return vdwdupq_x_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u8 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
> index caf744d7255..f7dca660c03 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_wb_u16 (a, b, 8, p);
> +  return vdwdupq_x_wb_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_u16 (a, b, 8, p);
> +  return vdwdupq_x_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u16 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
> index 8c8be86bce6..032ae94e8c3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_x_wb_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
>    return vdwdupq_x_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u32 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
> index 1c6ef4ed33f..5d238a7a865 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_wb_u8 (a, b, 2, p);
> +  return vdwdupq_x_wb_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_x_u8 (a, b, 2, p);
> +  return vdwdupq_x_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vdwdupq_x_u8 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 05/35] arm: improve vidupq* tests
  2022-11-17 16:37 ` [PATCH 05/35] arm: improve vidupq* tests Andrea Corallo
@ 2022-11-18 16:36   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:36 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 05/35] arm: improve vidupq* tests
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c: Improve tests.
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vidupq_m_n_u16.c       | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vidupq_m_n_u32.c       | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_m_n_u8.c        | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_m_wb_u16.c      | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vidupq_m_wb_u32.c      | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_m_wb_u8.c       | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_n_u16.c         | 32 ++++++++++--
>  .../arm/mve/intrinsics/vidupq_n_u32.c         | 28 +++++++++-
>  .../arm/mve/intrinsics/vidupq_n_u8.c          | 28 +++++++++-
>  .../arm/mve/intrinsics/vidupq_wb_u16.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/vidupq_wb_u32.c        | 28 +++++++++-
>  .../arm/mve/intrinsics/vidupq_wb_u8.c         | 28 +++++++++-
>  .../arm/mve/intrinsics/vidupq_x_n_u16.c       | 46 +++++++++++++---
>  .../arm/mve/intrinsics/vidupq_x_n_u32.c       | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_x_n_u8.c        | 42 +++++++++++++--
>  .../arm/mve/intrinsics/vidupq_x_wb_u16.c      | 52 +++++++++++++++----
>  .../arm/mve/intrinsics/vidupq_x_wb_u32.c      | 52 +++++++++++++++----
>  .../arm/mve/intrinsics/vidupq_x_wb_u8.c       | 52 +++++++++++++++----
>  18 files changed, 634 insertions(+), 88 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> index 822d41197e6..b4ee7af36e3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_m_n_u16 (inactive, a, 4, p);
> +  return vidupq_m_n_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_m (inactive, a, 4, p);
> +  return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> index c01826e15dc..b13a7a80dcb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_m_n_u32 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
> index e269665813c..b731002724a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_m_n_u8 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
> index 8d21bc7db80..0e2ad6a2b55 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vidupq_m_wb_u16 (inactive, a, 4, p);
> +  return vidupq_m_wb_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
>  {
> -  return vidupq_m (inactive, a, 4, p);
> +  return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
> index e7bc06cd826..786a05eee35 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vidupq_m_wb_u32 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
> index a8a2f9a1c49..3fcc3ba0d67 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vidupq_m_wb_u8 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t *a, mve_pred16_t p)
>  {
>    return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
> index c59ca1ebf74..a6ffdc05ce5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a)
>  {
> -  return vidupq_n_u16 (a, 4);
> +  return vidupq_n_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a)
>  {
> -  return vidupq_u16 (a, 4);
> +  return vidupq_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vidupq_u16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
> index 7e835e0868c..8cd43e38255 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a)
>  {
>    return vidupq_n_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a)
>  {
>    return vidupq_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vidupq_u32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
> index 06d1a1a1480..4005eabb45d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a)
>  {
>    return vidupq_n_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a)
>  {
>    return vidupq_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vidupq_u8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
> index 1cb0ded198f..3ad89c0536c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t *a)
>  {
> -  return vidupq_wb_u16 (a, 4);
> +  return vidupq_wb_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t *a)
>  {
> -  return vidupq_u16 (a, 4);
> +  return vidupq_u16 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return vidupq_u16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
> index e5d9c5327fb..45eb1b09a5b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t *a)
>  {
>    return vidupq_wb_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t *a)
>  {
>    return vidupq_u32 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return vidupq_u32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
> index 57e1bb46776..beb0aae67a9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t *a)
>  {
>    return vidupq_wb_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t *a)
>  {
>    return vidupq_u8 (a, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vidup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vidup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return vidupq_u8 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
> index bdf8ec2b047..74cd4310213 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_x_n_u16 (a, 4, p);
> +  return vidupq_x_n_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_x_u16 (a, 4, p);
> +  return vidupq_x_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vidupq_x_u16 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
> index 8be549cb446..3111b1a54e6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_x_n_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_x_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vidupq_x_u32 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
> index 1e1975017de..5bedb4f9e79 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_x_n_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, mve_pred16_t p)
>  {
>    return vidupq_x_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return vidupq_x_u8 (1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
> index 31197a76cfa..caf334fa32f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t *a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vidupq_x_wb_u16 (a, 8, p);
> +  return vidupq_x_wb_u16 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vidupq_x_u16 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vidupq_x_u16 (a, 8, p);
> +  return vidupq_x_u16 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
> index cef56f133e8..11895e303cf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t *a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vidupq_x_wb_u32 (a, 2, p);
> +  return vidupq_x_wb_u32 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vidupq_x_u32 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vidupq_x_u32 (a, 2, p);
> +  return vidupq_x_u32 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
> index 0403ba1174c..b951d4cfe94 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c
> @@ -1,25 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> -uint32_t * a;
> -
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (mve_pred16_t p)
> +foo (uint32_t *a, mve_pred16_t p)
>  {
> -  return vidupq_x_wb_u8 (a, 2, p);
> +  return vidupq_x_wb_u8 (a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (uint32_t *a, mve_pred16_t p)
> +{
> +  return vidupq_x_u8 (a, 1, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vidupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (mve_pred16_t p)
> +foo2 (mve_pred16_t p)
>  {
> -  return vidupq_x_u8 (a, 2, p);
> +  return vidupq_x_u8 (1, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 06/35] arm: improve tests and fix vdupq*
  2022-11-17 16:37 ` [PATCH 06/35] arm: improve tests and fix vdupq* Andrea Corallo
@ 2022-11-18 16:37   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:37 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 06/35] arm: improve tests and fix vdupq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vdupq_n_f<mode>)
> 	(mve_vdupq_n_<supf><mode>, mve_vdupq_m_n_<supf><mode>)
> 	(mve_vdupq_m_n_f<mode>): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  8 ++--
>  .../arm/mve/intrinsics/vdupq_m_n_f16.c        | 41 +++++++++++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_f32.c        | 41 +++++++++++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_s16.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_s32.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_s8.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_u16.c        | 41 +++++++++++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_u32.c        | 41 +++++++++++++++++--
>  .../arm/mve/intrinsics/vdupq_m_n_u8.c         | 41 +++++++++++++++++--
>  .../arm/mve/intrinsics/vdupq_n_f16.c          | 21 +++++++++-
>  .../arm/mve/intrinsics/vdupq_n_f32.c          | 21 +++++++++-
>  .../arm/mve/intrinsics/vdupq_n_s16.c          | 13 ++++--
>  .../arm/mve/intrinsics/vdupq_n_s32.c          | 13 ++++--
>  .../arm/mve/intrinsics/vdupq_n_s8.c           |  9 +++-
>  .../arm/mve/intrinsics/vdupq_n_u16.c          | 23 ++++++++++-
>  .../arm/mve/intrinsics/vdupq_n_u32.c          | 23 ++++++++++-
>  .../arm/mve/intrinsics/vdupq_n_u8.c           | 23 ++++++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_f16.c        | 30 +++++++++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_f32.c        | 30 +++++++++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_s16.c        | 14 ++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_s32.c        | 14 ++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_s8.c         | 14 ++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_u16.c        | 30 +++++++++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_u32.c        | 30 +++++++++++++-
>  .../arm/mve/intrinsics/vdupq_x_n_u8.c         | 30 +++++++++++++-
>  25 files changed, 567 insertions(+), 59 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 58ffe03c499..6d5270281ec 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -266,7 +266,7 @@ (define_insn "mve_vdupq_n_f<mode>"
>  	 VDUPQ_N_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vdup.%#<V_sz_elem>   %q0, %1"
> +  "vdup.%#<V_sz_elem>\t%q0, %1"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -435,7 +435,7 @@ (define_insn "mve_vdupq_n_<supf><mode>"
>  	 VDUPQ_N))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vdup.%#<V_sz_elem>   %q0, %1"
> +  "vdup.%#<V_sz_elem>\t%q0, %1"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -3046,7 +3046,7 @@ (define_insn "mve_vdupq_m_n_<supf><mode>"
>  	 VDUPQ_M_N))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vdupt.%#<V_sz_elem>	%q0, %2"
> +  "vpst\;vdupt.%#<V_sz_elem>\t%q0, %2"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -3991,7 +3991,7 @@ (define_insn "mve_vdupq_m_n_f<mode>"
>  	 VDUPQ_M_N_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vdupt.%#<V_sz_elem>	%q0, %2"
> +  "vpst\;vdupt.%#<V_sz_elem>\t%q0, %2"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> index 0b749be3527..bfa471bcb31 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_f16 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t inactive, mve_pred16_t p)
> +{
> +  return vdupq_m (inactive, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
> index 9cca5310c7a..e1dd8f58ad0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_f32 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t inactive, mve_pred16_t p)
> +{
> +  return vdupq_m (inactive, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
> index b521f13e94f..52304ace03a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_s16 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
> index 96aa195dc18..44a80c5d5bc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_s32 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
> index f1d222000c1..1630a3b9234 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_s8 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
> index 39d0c9f502d..d3df8b69248 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_u16 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vdupq_m (inactive, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
> index fc107172e16..e6bb0cc2c38 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_u32 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return vdupq_m (inactive, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
> index 9fd3bc443cb..ad6f6d04ae3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8_t a, mve_pred16_t p)
>  {
>    return vdupq_m_n_u8 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8_t a, mve_pred16_t p)
>  {
>    return vdupq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return vdupq_m (inactive, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> index 62bfc194533..fc5a7933653 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> @@ -1,13 +1,32 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16_t a)
>  {
>    return vdupq_n_f16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.16"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 ()
> +{
> +  return vdupq_n_f16 (1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> index f5ad2286d8d..a6be82e5927 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> @@ -1,13 +1,32 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32_t a)
>  {
>    return vdupq_n_f32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.32"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 ()
> +{
> +  return vdupq_n_f32 (1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
> index 1378522a18e..f842b96c3b1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
> @@ -1,13 +1,20 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16_t a)
>  {
>    return vdupq_n_s16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
> index 43affe856c0..05cbff8fdae 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
> @@ -1,13 +1,20 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32_t a)
>  {
>    return vdupq_n_s32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
> index 3f934dc5d59..1d141161604 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
> @@ -1,13 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8_t a)
>  {
>    return vdupq_n_s8 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
> index 93268643fec..4839d427e65 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
> @@ -1,13 +1,32 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16_t a)
>  {
> -    return vdupq_n_u16 (a);
> +  return vdupq_n_u16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.16"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vdup.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 ()
> +{
> +  return vdupq_n_u16 (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
> index 276e9ddc67f..f0069eb7280 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
> @@ -1,13 +1,32 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a)
>  {
> -    return vdupq_n_u32 (a);
> +  return vdupq_n_u32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.32"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vdup.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 ()
> +{
> +  return vdupq_n_u32 (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
> index d0361c15047..fe26687ae45 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
> @@ -1,13 +1,32 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8_t a)
>  {
> -    return vdupq_n_u8 (a);
> +  return vdupq_n_u8 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vdup.8"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vdup.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 ()
> +{
> +  return vdupq_n_u8 (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
> index c91ee62791c..11ebb47f94f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c
> @@ -1,14 +1,40 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_f16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 (mve_pred16_t p)
> +{
> +  return vdupq_x_n_f16 (1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
> index c2b39051f5b..4e79bd54f71 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c
> @@ -1,14 +1,40 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_f32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 (mve_pred16_t p)
> +{
> +  return vdupq_x_n_f32 (1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
> index cc8a5bfeca1..90288777df7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
> @@ -1,14 +1,24 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_s16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
> index b3ed3eb68e8..c4c906e0682 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
> @@ -1,14 +1,24 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_s32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
> index 3be865dcc84..6234730827e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
> @@ -1,14 +1,24 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_s8 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
> index d01338aeb91..821fcddcab1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c
> @@ -1,14 +1,40 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_u16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.16"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (mve_pred16_t p)
> +{
> +  return vdupq_x_n_u16 (1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
> index 8fa7d4552bc..20125df6226 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c
> @@ -1,14 +1,40 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_u32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.32"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (mve_pred16_t p)
> +{
> +  return vdupq_x_n_u32 (1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
> index 96ad899c9c2..defaaeebfcf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c
> @@ -1,14 +1,40 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8_t a, mve_pred16_t p)
>  {
>    return vdupq_x_n_u8 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdupt.8"  }  } */
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vdupt.8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (mve_pred16_t p)
> +{
> +  return vdupq_x_n_u8 (1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 07/35] arm: improve tests and fix vcmp*
  2022-11-17 16:37 ` [PATCH 07/35] arm: improve tests and fix vcmp* Andrea Corallo
@ 2022-11-18 16:40   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:40 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 07/35] arm: improve tests and fix vcmp*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Fix
> 	spacing.
> 	* config/arm/arm_mve.h (__arm_vcmpgtq_m, __arm_vcmpleq_m)
> 	(__arm_vcmpltq_m, __arm_vcmpneq_m): Add missing defines.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmphiq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpleq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpltq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vcmpneq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h                      | 47 +++++++++++++++++++
>  gcc/config/arm/mve.md                         |  2 +-
>  .../arm/mve/intrinsics/vcmpcsq_m_n_u16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_m_n_u32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_m_n_u8.c       | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_m_u16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_m_u32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_m_u8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpcsq_n_u16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpcsq_n_u32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpcsq_n_u8.c         | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpcsq_u16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpcsq_u32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpcsq_u8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_u16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_u32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_n_u8.c       | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_u16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_u32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_m_u8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpeqq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_u16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_u32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpeqq_n_u8.c         | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpeqq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_u16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_u32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpeqq_u8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgeq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpgeq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpgeq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgeq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpgtq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpgtq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpgtq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpgtq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmphiq_m_n_u16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_m_n_u32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_m_n_u8.c       | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_m_u16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_m_u32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_m_u8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmphiq_n_u16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmphiq_n_u32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmphiq_n_u8.c         | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmphiq_u16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmphiq_u32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmphiq_u8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpleq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpleq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpleq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpleq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpltq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpltq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpltq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpltq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_f16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_f32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_m_f16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_f32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_f16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_f32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_s16.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_s32.c      | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_s8.c       | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_u16.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_u32.c      | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_n_u8.c       | 47 +++++++++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_s16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_s32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_s8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_u16.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_u32.c        | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_m_u8.c         | 29 ++++++++++--
>  .../arm/mve/intrinsics/vcmpneq_n_f16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_f32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_s16.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_s32.c        | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_s8.c         | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_u16.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_u32.c        | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpneq_n_u8.c         | 34 +++++++++++++-
>  .../arm/mve/intrinsics/vcmpneq_s16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_s32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_s8.c           | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_u16.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_u32.c          | 20 +++++++-
>  .../arm/mve/intrinsics/vcmpneq_u8.c           | 20 +++++++-
>  170 files changed, 4512 insertions(+), 421 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 073e3711623..684f997520f 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -39229,6 +39229,53 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> 
> +
> +#define __arm_vcmpgtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +
> +#define __arm_vcmpleq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpleq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpleq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +
> +#define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpltq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpltq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +
> +#define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpneq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpneq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpneq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpneq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2));})
> +
>  #define __arm_vdupq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 6d5270281ec..3330a220aea 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -831,7 +831,7 @@ (define_insn
> "@mve_vcmp<mve_cmp_op>q_<mode>"
>  		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vcmp.<mve_cmp_type>%#<V_sz_elem>  <mve_cmp_op>, %q1, %q2"
> +  "vcmp.<mve_cmp_type>%#<V_sz_elem>\t<mve_cmp_op>, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
> index a1640133012..de9fe5e7d01 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpcsq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
> index d269ec7e3ab..04df1b2dc61 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpcsq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
> index 52c16b3e70f..34ebadca248 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vcmpcsq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
> index e68afa316a9..bc03bf687de 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
> index 05d1b21b279..8e216d49a02 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
> index 4c8a9d0aa2c..ac4196a2e48 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpcsq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
> index 4124036003e..6038f4c8c65 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vcmpcsq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u16	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a)
> +{
> +  return vcmpcsq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
> index 463c1ee12b4..9f39aa761c8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vcmpcsq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u32	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a)
> +{
> +  return vcmpcsq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
> index 92bc44a4bb6..0ce2cd13a7b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vcmpcsq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u8	cs, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vcmpcsq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
> index 26c7d750cef..5598d06875c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpcsq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u16	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
> index c91b0e1c2e3..99b232b05dd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpcsq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u32	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
> index 51ddab91500..571e57135ab 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpcsq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u8	cs, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpcsq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
> index 556351f4984..57b276a1d4c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpeqq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
> index 65b2f240520..ab1b25e2888 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpeqq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
> index 91b0ffa0afd..c5587884d0e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
> index d66e9c8be34..4e9675fff51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
> index 46b3f4499d3..a3cae828e79 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpeqq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
> index 7d672c129db..a7ce9e0c7e3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpeqq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
> index 912d4ad893d..7ba481e169f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
> index 947c331622d..13c88eaabb5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
> index e215d655ea2..dcf276dee44 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
> index ea4716c450e..d59d5149a30 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpeqq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
> index 489c6ec0cb3..1fbf385d030 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpeqq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
> index e8dfce432d1..92758c98c9a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vcmpeqq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
> index 7e4c141e5d2..1ea35ed924b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
> index 904cfb6fe37..a9bc9733842 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
> index a7e12164e32..a9fe771a101 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
> index 283e1fd036e..826901874d7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
> index ad1739bd609..512b7f9c889 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
> index 595142e9cda..01b4507ba63 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpeqq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
> index f97209d2322..cf2812558ff 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpeqq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpeqq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
> index c80843288b2..13817174282 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpeqq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpeqq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
> index 69f1f531af4..bd29828492e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpeqq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
> index 06032dbcc20..2a0d84e9b51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpeqq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
> index 3ebd88be85b..524bbe9f3cb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpeqq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
> index 2f6c53a525e..3eeaa49aa97 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vcmpeqq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a)
> +{
> +  return vcmpeqq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
> index 22fb5be97c5..a881bb841af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vcmpeqq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a)
> +{
> +  return vcmpeqq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
> index 79eaeed6950..429b2e35eb7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vcmpeqq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vcmpeqq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
> index 7951ead8a31..92a87c08773 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpeqq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
> index 659ccb4ac14..d3b87d59bfa 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpeqq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
> index 9282ec2a97a..2b71bbf75f6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpeqq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
> index 318b7aa9306..1830b667bb6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpeqq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
> index 88e015f1fa3..2b2a5f920f3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpeqq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
> index 990a96f7b3f..9450c203394 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpeqq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	eq, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpeqq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
> index eea63a2fe50..fd8bcab4f25 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpgeq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
> index 64243fe3e8c..a2d50b580e7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpgeq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
> index 3588b0a536f..a631825fadd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
> index 8ed1d22e919..b94e0738ef0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
> index d106af8f53b..9f4903d9cfd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpgeq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
> index 1feef8adb7f..679e644f165 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpgeq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
> index c0ad38f6c6f..45e26d0a77b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
> index 8974ce4d11a..3a6cad921f2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
> index 981aa1b516c..ce1ca30d6ea 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
> index 587432a6af1..51587a38b72 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
> index e460a8dcafc..3ff0aaaa414 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
> index cde28a314b9..df71ee57945 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpgeq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
> index 907fa5d50f6..2ca1b9d6684 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpgeq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpgeq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
> index e4d1406c049..3af110bd2b2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpgeq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpgeq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
> index f4aad09e783..3c1af8a93ab 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpgeq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
> index 2baa5204819..8b4e0f426e5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpgeq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
> index 1dcffcc3050..c1669bcdd90 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpgeq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	ge, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
> index 817ffb2d8ac..593c7410dcb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpgeq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
> index d608b7fc9cf..9e26ea9938a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpgeq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
> index 506e6cede95..3cb2832e159 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpgeq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	ge, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpgeq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
> index e2bfd7ed156..8835fe08dba 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpgtq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
> index 1b4433f0e76..e1470884708 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpgtq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
> index def3f90a79d..cb9d5f4036f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
> index 41a11563f36..b249b831782 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
> index 80c86f65825..b375983f01e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpgtq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
> index 9b7aaadfe71..208a285cb39 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpgtq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
> index c0719d0110c..248e3093d2a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
> index 26df8cea9fc..9843288296e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
> index f20c50d69c1..80f1aa9ead0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
> index da97abceb2e..9289c00b5af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
> index ab7c218c7af..8a3d7606bb7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
> index 13520d1067b..2760795eb86 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpgtq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
> index 98e152cd999..9f2a4be319a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpgtq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpgtq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
> index 5691e2f9d35..bbf18ebe6e7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpgtq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpgtq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
> index bc3bdbae2da..d833cb6f58e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpgtq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
> index 409a3f9d808..28cd51b9582 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpgtq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
> index 2624307be9d..5a953ca55f4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpgtq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	gt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
> index be19e19f09f..b9c9da486f5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpgtq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
> index 95f6c703b9d..0f79385358e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpgtq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
> index 8ba180d8e39..f59dad94a57 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpgtq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	gt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpgtq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
> index 26e5fe3f900..136a2e44259 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vcmphiq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
> index 51396b8d0cd..5640b97afaf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vcmphiq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
> index 475f2e82345..e6474e45487 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vcmphiq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
> index 98ba895fde0..38b9b90c803 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
> index ee561b02d0c..97c8c1dfe05 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
> index 0c5b29e2673..e2024ccda25 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmphiq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
> index d39b755441d..36107fc7b8d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vcmphiq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u16	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a)
> +{
> +  return vcmphiq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
> index dbedea9b078..d34de8f65c7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vcmphiq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u32	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a)
> +{
> +  return vcmphiq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
> index 967bb206886..93a05b1a857 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vcmphiq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.u8	hi, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vcmphiq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
> index f9399498a99..40e65dc52f4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmphiq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u16	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
> index becdef0696a..d87a4185762 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmphiq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u32	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
> index 933cc69507d..80fd2a40b0f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_u8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmphiq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.u8	hi, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmphiq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
> index c2e69a5de92..209d81096af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpleq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
> index 923aee050d3..b92c5f66fd9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpleq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
> index 66a37192985..e6136898ded 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
> index e679b338d58..2304e98d253 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
> index 42049fd57a4..a61db2817c1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpleq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
> index c68bd4e5900..7a2cdb4059d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpleq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
> index 0cdc14455a3..69fcab15b8a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
> index a955af8fa2b..617ebd6144f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
> index d9951e4a8cf..b8ee50dd55c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
> index f16aff86ef0..fcc376d6ec3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
> index 2c4e659e9cf..9983e89d80c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
> index 69b88cfb389..504e4feb5d1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpleq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
> index 3fa3c5e0310..cfa6dbc07c7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpleq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpleq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
> index 8349de7b68c..c89558f4076 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpleq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpleq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
> index 5ecae572227..da73fc14b77 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpleq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
> index 02320e7a552..0951a5c13fb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpleq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
> index a0ac97328b7..e4553354681 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpleq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	le, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
> index 2fb4acd3d74..68500da9ddf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpleq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
> index 2ae998efb7c..1966bcd94d3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpleq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
> index da06b019cc1..e9f6e47e5d6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpleq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	le, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpleq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
> index eab80b2ddd9..b4958816bd8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpltq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
> index f17d16482dd..752ab2b3e49 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpltq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
> index 93c36f3a613..cbaacbe2b47 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
> index a17f0b02a95..96d0e7c7cc6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
> index 45d0f51b4d7..1e5db53198e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpltq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
> index 16e37ccaf8d..77de40ade01 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpltq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
> index d0e322fbede..beebe65a58f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
> index 7ec7963267a..07260c56ed3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
> index 22434e88cd6..7d1e9e7fbde 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
> index 359c0640784..c0f6dfc9432 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
> index 3df7e89a6f5..b6fc4700e73 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
> index 1055c2b661c..545b76359ad 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpltq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
> index 2d55af20dd3..401ef21ba2b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpltq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpltq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
> index 2590ca83c45..380f071e564 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpltq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpltq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
> index 169f6ad4610..a1d12392dd2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpltq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
> index 534047c2df3..6332f75f327 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpltq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
> index da659f1f2be..e0ac80caeb0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpltq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	lt, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
> index da4c90a07de..23843ad88f3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpltq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s16	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
> index 5dc218a5f40..aeb7a6f9896 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpltq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s32	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
> index ea5853c212c..2129b56a5f7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpltq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.s8	lt, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
> index 8d1c6096c56..c27ea2f0de8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vcmpneq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
> index 860bd69c129..609de44d8e7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_f32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vcmpneq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
> index a4e62de7272..98f22337d61 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
> index b18a2e5fd88..7f6e96ae47e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
> index c127b3a68f6..71b3476fb18 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpneq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
> index a8423d45708..d6dea8db865 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpneq_m (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
> index 63ee1c3bffb..e72c9b62829 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
> index 10f6d448d76..47c90e31f49 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
> index 66e5d158c51..9d9da100046 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
> index ffe6ff919cf..ea8cf24b358 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vcmpneq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
> index 55e796a1138..30291dcdd9b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vcmpneq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
> index 3c8bd16647a..be75376a691 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c
> @@ -1,22 +1,63 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vcmpneq_m (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
> index d3e1ce0e690..60e868141d0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
> index f5602ffd0da..780c544bef3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
> index 84b8b1617b0..15f6d316cba 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
> index 3c8943719bb..300852ed7b3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
> index 980cc4124b2..227b5f01eca 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
> index 2615dcb37b9..cfcb59f49cf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vcmpt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vcmpt.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vcmpneq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
> index e9e2a9c7b04..29e43f3fdf8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vcmpneq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float16x8_t a)
> +{
> +  return vcmpneq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
> index eb64b17969c..688e77cd044 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vcmpneq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.f32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (float32x4_t a)
> +{
> +  return vcmpneq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
> index 14689242ee4..2afc34d16e5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vcmpneq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
> index 53418ff3923..6c323161316 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vcmpneq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
> index fa405c281b4..5483d6dd2fe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vcmpneq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
> index cc8540b3a6c..d8edfb0d825 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vcmpneq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint16x8_t a)
> +{
> +  return vcmpneq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
> index 07c9b1ade96..2b7a6b56830 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vcmpneq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint32x4_t a)
> +{
> +  return vcmpneq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
> index eac5e96384e..2dab43af331 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c
> @@ -1,21 +1,51 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vcmpneq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
> +mve_pred16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vcmpneq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
> index 6b04ce70ffc..d57b607baa9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vcmpneq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
> index cfb98d7e650..e02171f6686 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vcmpneq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
> index ae69be4ba0b..0abef8c3e00 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vcmpneq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
> index 51059f21191..7144f3ee2fc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpneq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i16	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
> index 42e4a3f4f2d..a31134f2f1d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpneq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
> index addacc15833..2801c8e3763 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
> @@ -1,21 +1,37 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpneq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vcmp.i8	ne, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +**	vmrs	(?:ip|fp|r[0-9]+), p0(?:	@.*|)
> +**	...
> +*/
>  mve_pred16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vcmpneq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vcmp.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 08/35] arm: improve tests for vmin*
  2022-11-17 16:37 ` [PATCH 08/35] arm: improve tests for vmin* Andrea Corallo
@ 2022-11-18 16:41   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:41 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 08/35] arm: improve tests for vmin*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vminaq_m_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vminaq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminaq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminaq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminaq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminaq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmaq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmaq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminq_x_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vminaq_m_s16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vminaq_m_s32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vminaq_m_s8.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vminaq_s16.c           | 16 +++++++-
>  .../arm/mve/intrinsics/vminaq_s32.c           | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminaq_s8.c | 16 +++++++-
>  .../arm/mve/intrinsics/vminavq_p_s16.c        | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vminavq_p_s32.c        | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vminavq_p_s8.c         | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vminavq_s16.c          | 29 ++++++++++---
>  .../arm/mve/intrinsics/vminavq_s32.c          | 29 ++++++++++---
>  .../arm/mve/intrinsics/vminavq_s8.c           | 29 ++++++++++---
>  .../arm/mve/intrinsics/vminnmaq_f16.c         | 16 +++++++-
>  .../arm/mve/intrinsics/vminnmaq_f32.c         | 16 +++++++-
>  .../arm/mve/intrinsics/vminnmaq_m_f16.c       | 25 +++++++++--
>  .../arm/mve/intrinsics/vminnmaq_m_f32.c       | 25 +++++++++--
>  .../arm/mve/intrinsics/vminnmavq_f16.c        | 27 +++++++++---
>  .../arm/mve/intrinsics/vminnmavq_f32.c        | 27 +++++++++---
>  .../arm/mve/intrinsics/vminnmavq_p_f16.c      | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminnmavq_p_f32.c      | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminnmq_f16.c          | 16 +++++++-
>  .../arm/mve/intrinsics/vminnmq_f32.c          | 16 +++++++-
>  .../arm/mve/intrinsics/vminnmq_m_f16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminnmq_m_f32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminnmq_x_f16.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vminnmq_x_f32.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vminnmvq_f16.c         | 27 +++++++++---
>  .../arm/mve/intrinsics/vminnmvq_f32.c         | 27 +++++++++---
>  .../arm/mve/intrinsics/vminnmvq_p_f16.c       | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminnmvq_p_f32.c       | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminq_m_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminq_m_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminq_m_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminq_m_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminq_m_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vminq_m_u8.c           | 26 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vminq_s16.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminq_s32.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminq_s8.c  | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminq_u16.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminq_u32.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vminq_u8.c  | 16 +++++++-
>  .../arm/mve/intrinsics/vminq_x_s16.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vminq_x_s32.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vminq_x_s8.c           | 25 +++++++++--
>  .../arm/mve/intrinsics/vminq_x_u16.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vminq_x_u32.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vminq_x_u8.c           | 25 +++++++++--
>  .../arm/mve/intrinsics/vminvq_p_s16.c         | 31 ++++++++++----
>  .../arm/mve/intrinsics/vminvq_p_s32.c         | 31 ++++++++++----
>  .../arm/mve/intrinsics/vminvq_p_s8.c          | 31 ++++++++++----
>  .../arm/mve/intrinsics/vminvq_p_u16.c         | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminvq_p_u32.c         | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminvq_p_u8.c          | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vminvq_s16.c           | 22 ++++++----
>  .../arm/mve/intrinsics/vminvq_s32.c           | 22 ++++++----
>  .../gcc.target/arm/mve/intrinsics/vminvq_s8.c | 22 ++++++----
>  .../arm/mve/intrinsics/vminvq_u16.c           | 29 ++++++++++---
>  .../arm/mve/intrinsics/vminvq_u32.c           | 26 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vminvq_u8.c | 29 ++++++++++---
>  60 files changed, 1320 insertions(+), 255 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
> index 0324110c6a8..925b9154ca7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminaq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
> index a2886d4f40f..296f69dfcda 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminaq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
> index 95eb038efc0..cf6fecc3461 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_m_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminaq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminat.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
> index 3a157e00a27..63f59f8c80a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmina.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b)
>  {
>    return vminaq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmina.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b)
>  {
>    return vminaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
> index 5c732c65d63..eb0a54cbe19 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmina.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b)
>  {
>    return vminaq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmina.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b)
>  {
>    return vminaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
> index 2e4dad141ce..b875308863d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminaq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmina.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b)
>  {
>    return vminaq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmina.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b)
>  {
>    return vminaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmina.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
> index 9303ae02e39..5d3c40fb1fc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, int16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint16_t a, int16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminavq_p (a, b, p);
>  }
> 
> -
> -int16_t
> -foo2 (uint8_t a, int16x8_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16_t
> +foo2 (int16x8_t b, mve_pred16_t p)
>  {
> -  return vminavq_p (a, b, p);
> +  return vminavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminavt.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
> index 36247f68b2c..ee4ff251d63 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint32_t a, int32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminavq_p (a, b, p);
>  }
> 
> -
> -int32_t
> -foo2 (uint16_t a, int32x4_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b, mve_pred16_t p)
>  {
> -  return vminavq_p (a, b, p);
> +  return vminavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminavt.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
> index d3361615dcc..14602c29719 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_p_s8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, int8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint8_t a, int8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminavq_p (a, b, p);
>  }
> 
> -
> -int8_t
> -foo2 (uint32_t a, int8x16_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8_t
> +foo2 (int8x16_t b, mve_pred16_t p)
>  {
> -  return vminavq_p (a, b, p);
> +  return vminavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminavt.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
> index 17e4edca2f1..51f75ae1f6a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, int16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint16_t a, int16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, int16x8_t b)
>  {
>    return vminavq (a, b);
>  }
> 
> -
> -int16_t
> -foo2 (uint8_t a, int16x8_t b)
> +/*
> +**foo2:
> +**	...
> +**	vminav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16_t
> +foo2 (int16x8_t b)
>  {
> -  return vminavq (a, b);
> +  return vminavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminav.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
> index 032d02b8857..d1602cebe18 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint32_t a, int32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b)
>  {
>    return vminavq (a, b);
>  }
> 
> -
> -int32_t
> -foo2 (uint16_t a, int32x4_t b)
> +/*
> +**foo2:
> +**	...
> +**	vminav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b)
>  {
> -  return vminavq (a, b);
> +  return vminavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminav.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
> index 2a2bb3d6146..f4c9b045b90 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminavq_s8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, int8x16_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint8_t a, int8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, int8x16_t b)
>  {
>    return vminavq (a, b);
>  }
> 
> -
> -int8_t
> -foo2 (uint32_t a, int8x16_t b)
> +/*
> +**foo2:
> +**	...
> +**	vminav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8_t
> +foo2 (int8x16_t b)
>  {
> -  return vminavq (a, b);
> +  return vminavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminav.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
> index cf32186d642..1728d104266 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vminnmaq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnma.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vminnmaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnma.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
> index 1c3f19c9e1b..42b4265d9cc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vminnmaq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnma.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vminnmaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnma.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
> index 4423903e913..51b85bd2b04 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmaq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmat.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
> index 683f40ad3d8..2f0423ecb4f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmaq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmat.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
> index fadb23e05c8..17e4ad16759 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b)
>  {
>    return vminnmavq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vminnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b)
> +foo2 (float16x8_t b)
>  {
> -  return vminnmavq (a, b);
> +  return vminnmavq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmav.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
> index 84714a96b9f..2758e59666e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_f32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b)
>  {
>    return vminnmavq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vminnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b)
> +foo2 (float32x4_t b)
>  {
> -  return vminnmavq (a, b);
> +  return vminnmavq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmav.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
> index c79fa307ae0..b60a6627aea 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmavq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
> +foo2 (float16x8_t b, mve_pred16_t p)
>  {
> -  return vminnmavq_p (a, b, p);
> +  return vminnmavq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmavt.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
> index bea04c7aac6..6fa97b74a65 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmavq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
> +foo2 (float32x4_t b, mve_pred16_t p)
>  {
> -  return vminnmavq_p (a, b, p);
> +  return vminnmavq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmavt.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
> index 18d4a4c1330..c0962b52631 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vminnmq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnm.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vminnmq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnm.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
> index 34144cad17f..a9c3e5f74b1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vminnmq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnm.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vminnmq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vminnm.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
> index e5533d28035..466264249c5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
> index 382d16c4489..57edc8e1a80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
> index 04d606ce5cd..73b4ccba080 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
> index 87cd970fd11..9a824566212 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vminnmt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
> index 0eb3a4af14e..dc00d02df7d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b)
>  {
>    return vminnmvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vminnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b)
> +foo2 (float16x8_t b)
>  {
> -  return vminnmvq (a, b);
> +  return vminnmvq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmv.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
> index f3183508f8e..ff23c818452 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_f32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b)
>  {
>    return vminnmvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vminnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b)
> +foo2 (float32x4_t b)
>  {
> -  return vminnmvq (a, b);
> +  return vminnmvq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmv.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
> index 16f6ac514c8..ad99f586d11 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vminnmvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
> +foo2 (float16x8_t b, mve_pred16_t p)
>  {
> -  return vminnmvq_p (a, b, p);
> +  return vminnmvq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmvt.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
> index a8e4f9ffba7..3c7e5c07a68 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vminnmvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
> +foo2 (float32x4_t b, mve_pred16_t p)
>  {
> -  return vminnmvq_p (a, b, p);
> +  return vminnmvq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminnmvt.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
> index f257ddcf600..fe7368eeb38 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
> index 957da71d0e3..a90a1db8835 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
> index fea8bfd7994..911bd3af0dc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
> index 7cc19a7dd5d..f80288aaf79 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vminq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
> index 301fbfc751f..b480089f4f3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vminq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
> index 7a65b3557a3..73633c9612e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vminq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vminq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
> index d46a3c4ee18..eb34dc4c41c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vminq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
> index 601e918a5bf..60d29da4e14 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vminq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
> index e2ae2341ad8..675fb8edfb1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vminq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
> index 3cac573f6ef..50f648d5133 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vminq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
> index ca3ef245fe9..bcfead39c5a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vminq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
> index b7ef4db22ff..e8eacae4da8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmin.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vminq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmin.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vminq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmin.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
> index af93c78658e..0d8987e16b8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
> index 76f0831e48e..3c3595171ea 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
> index fdd6e94497c..402c4aa121d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
> index 9842954c761..e27a3416e38 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vminq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
> index 741e4508879..d3cb29bf60c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vminq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
> index 13743fc87a1..3e05ef7dd13 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vminq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmint.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmint.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vminq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
> index 91bb63f6ba6..7c25c9d2f82 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo (int16_t a, int16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int16_t a, int16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo1 (int16_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> -int16_t
> -foo2 (int8_t a, int16x8_t b, mve_pred16_t p)
> -{
> -  return vminvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
> index a846701312c..d5f7418af38 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int32_t a, int32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> -int32_t
> -foo2 (int16_t a, int32x4_t b, mve_pred16_t p)
> -{
> -  return vminvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
> index 716d414f3a7..6a42170fc19 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_s8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo (int8_t a, int8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int8_t a, int8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo1 (int8_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> -int8_t
> -foo2 (int32_t a, int8x16_t b, mve_pred16_t p)
> -{
> -  return vminvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
> index cc7f8fe8933..8f2f68fef84 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
> -foo2 (uint32_t a, uint16x8_t b, mve_pred16_t p)
> +foo2 (uint16x8_t b, mve_pred16_t p)
>  {
> -  return vminvq_p (a, b, p);
> +  return vminvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.u16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
> index 6bde0be29cc..9d14c39c1dc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo2 (uint8_t a, uint32x4_t b, mve_pred16_t p)
> +foo2 (uint32x4_t b, mve_pred16_t p)
>  {
> -  return vminvq_p (a, b, p);
> +  return vminvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.u32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
> index bb894904f3c..4c1f4406852 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_p_u8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vminvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vminvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
> -foo2 (uint16_t a, uint8x16_t b, mve_pred16_t p)
> +foo2 (uint8x16_t b, mve_pred16_t p)
>  {
> -  return vminvq_p (a, b, p);
> +  return vminvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminvt.u8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
> index 6d589aa4a05..e3242c0aa4d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo (int16_t a, int16x8_t b)
>  {
> @@ -11,17 +18,16 @@ foo (int16_t a, int16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo1 (int16_t a, int16x8_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> -int16_t
> -foo2 (int8_t a, int16x8_t b)
> -{
> -  return vminvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
> index 7c727d6d92b..1325b38411d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b)
>  {
> @@ -11,17 +18,16 @@ foo (int32_t a, int32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> -int32_t
> -foo2 (int8_t a, int32x4_t b)
> -{
> -  return vminvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
> index 76309482fc5..81c14a8ac6b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_s8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo (int8_t a, int8x16_t b)
>  {
> @@ -11,17 +18,16 @@ foo (int8_t a, int8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo1 (int8_t a, int8x16_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> -int8_t
> -foo2 (int32_t a, int8x16_t b)
> -{
> -  return vminvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
> index 698975f456c..4372ac62388 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, uint16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint16_t a, uint16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, uint16x8_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> -
> -uint8_t
> -foo2 (uint32_t a, uint16x8_t b)
> +/*
> +**foo2:
> +**	...
> +**	vminv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16_t
> +foo2 (uint16x8_t b)
>  {
> -  return vminvq (a, b);
> +  return vminvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.u16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
> index 7489f81debf..aff3679f49d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b)
>  {
> @@ -11,17 +18,28 @@ foo (uint32_t a, uint32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> +/*
> +**foo2:
> +**	...
> +**	vminv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo2 (uint16_t a, uint32x4_t b)
> +foo2 (uint32x4_t b)
>  {
> -  return vminvq (a, b);
> +  return vminvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.u32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
> index aa2b986d558..883e5f2d2c7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vminvq_u8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, uint8x16_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint8_t a, uint8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, uint8x16_t b)
>  {
>    return vminvq (a, b);
>  }
> 
> -
> -uint16_t
> -foo2 (uint32_t a, uint8x16_t b)
> +/*
> +**foo2:
> +**	...
> +**	vminv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8_t
> +foo2 (uint8x16_t b)
>  {
> -  return vminvq (a, b);
> +  return vminvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vminv.u8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 09/35] arm: improve tests for vmax*
  2022-11-17 16:37 ` [PATCH 09/35] arm: improve tests for vmax* Andrea Corallo
@ 2022-11-18 16:42   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:42 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 09/35] arm: improve tests for vmax*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxaq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmaxaq_m_s16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxaq_m_s32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxaq_m_s8.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxaq_s16.c           | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxaq_s32.c           | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxaq_s8.c | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxavq_p_s16.c        | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmaxavq_p_s32.c        | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmaxavq_p_s8.c         | 41 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmaxavq_s16.c          | 29 ++++++++++---
>  .../arm/mve/intrinsics/vmaxavq_s32.c          | 29 ++++++++++---
>  .../arm/mve/intrinsics/vmaxavq_s8.c           | 29 ++++++++++---
>  .../arm/mve/intrinsics/vmaxnmaq_f16.c         | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxnmaq_f32.c         | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxnmaq_m_f16.c       | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxnmaq_m_f32.c       | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxnmavq_f16.c        | 27 +++++++++---
>  .../arm/mve/intrinsics/vmaxnmavq_f32.c        | 27 +++++++++---
>  .../arm/mve/intrinsics/vmaxnmavq_p_f16.c      | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxnmavq_p_f32.c      | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxnmq_f16.c          | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxnmq_f32.c          | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxnmq_m_f16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxnmq_m_f32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxnmq_x_f16.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxnmq_x_f32.c        | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxnmvq_f16.c         | 27 +++++++++---
>  .../arm/mve/intrinsics/vmaxnmvq_f32.c         | 27 +++++++++---
>  .../arm/mve/intrinsics/vmaxnmvq_p_f16.c       | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxnmvq_p_f32.c       | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxq_m_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxq_m_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxq_m_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxq_m_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxq_m_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmaxq_m_u8.c           | 26 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vmaxq_s16.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxq_s32.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxq_s8.c  | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxq_u16.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxq_u32.c | 16 +++++++-
>  .../gcc.target/arm/mve/intrinsics/vmaxq_u8.c  | 16 +++++++-
>  .../arm/mve/intrinsics/vmaxq_x_s16.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxq_x_s32.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxq_x_s8.c           | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxq_x_u16.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxq_x_u32.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxq_x_u8.c           | 25 +++++++++--
>  .../arm/mve/intrinsics/vmaxvq_p_s16.c         | 31 ++++++++++----
>  .../arm/mve/intrinsics/vmaxvq_p_s32.c         | 31 ++++++++++----
>  .../arm/mve/intrinsics/vmaxvq_p_s8.c          | 31 ++++++++++----
>  .../arm/mve/intrinsics/vmaxvq_p_u16.c         | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxvq_p_u32.c         | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxvq_p_u8.c          | 39 +++++++++++++++---
>  .../arm/mve/intrinsics/vmaxvq_s16.c           | 23 +++++++----
>  .../arm/mve/intrinsics/vmaxvq_s32.c           | 23 +++++++----
>  .../gcc.target/arm/mve/intrinsics/vmaxvq_s8.c | 23 +++++++----
>  .../arm/mve/intrinsics/vmaxvq_u16.c           | 27 +++++++++---
>  .../arm/mve/intrinsics/vmaxvq_u32.c           | 27 +++++++++---
>  .../gcc.target/arm/mve/intrinsics/vmaxvq_u8.c | 27 +++++++++---
>  60 files changed, 1318 insertions(+), 257 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
> index 48d213277df..4c487ed7f60 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
> index 49273819861..5156467f0c1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
> index 5ecdb2c19dc..6564bd88c9b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxat.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxat.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
> index f9a9f896aa2..6cabf9f723b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxa.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b)
>  {
>    return vmaxaq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxa.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b)
>  {
>    return vmaxaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
> index efe2fc16ff7..d0dd3c23600 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxa.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b)
>  {
>    return vmaxaq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxa.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b)
>  {
>    return vmaxaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
> index 5c2e35f71a6..a7344638dcf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxaq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxa.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b)
>  {
>    return vmaxaq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxa.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b)
>  {
>    return vmaxaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxa.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
> index 74ffad4e726..ac81c8fd1bd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, int16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint16_t a, int16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxavq_p (a, b, p);
>  }
> 
> -
> -int16_t
> -foo2 (uint8_t a, int16x8_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16_t
> +foo2 (int16x8_t b, mve_pred16_t p)
>  {
> -  return vmaxavq_p (a, b, p);
> +  return vmaxavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxavt.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
> index 40800b0f12e..119c0c34c76 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint32_t a, int32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxavq_p (a, b, p);
>  }
> 
> -
> -int32_t
> -foo2 (uint16_t a, int32x4_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b, mve_pred16_t p)
>  {
> -  return vmaxavq_p (a, b, p);
> +  return vmaxavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxavt.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
> index 7638737fb84..dfd7f828ef6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, int8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint8_t a, int8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxavq_p (a, b, p);
>  }
> 
> -
> -int8_t
> -foo2 (uint32_t a, int8x16_t b, mve_pred16_t p)
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8_t
> +foo2 (int8x16_t b, mve_pred16_t p)
>  {
> -  return vmaxavq_p (a, b, p);
> +  return vmaxavq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxavt.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
> index 0dca149b3e8..9f59e8e4542 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, int16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint16_t a, int16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, int16x8_t b)
>  {
>    return vmaxavq (a, b);
>  }
> 
> -
> -int16_t
> -foo2 (uint8_t a, int16x8_t b)
> +/*
> +**foo2:
> +**	...
> +**	vmaxav.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16_t
> +foo2 (int16x8_t b)
>  {
> -  return vmaxavq (a, b);
> +  return vmaxavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxav.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
> index f419a771017..716b8a2a979 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint32_t a, int32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b)
>  {
>    return vmaxavq (a, b);
>  }
> 
> -
> -int32_t
> -foo2 (uint16_t a, int32x4_t b)
> +/*
> +**foo2:
> +**	...
> +**	vmaxav.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b)
>  {
> -  return vmaxavq (a, b);
> +  return vmaxavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxav.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
> index 214ad88f4aa..0f1a87af54b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxavq_s8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, int8x16_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint8_t a, int8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, int8x16_t b)
>  {
>    return vmaxavq (a, b);
>  }
> 
> -
> -int8_t
> -foo2 (uint32_t a, int8x16_t b)
> +/*
> +**foo2:
> +**	...
> +**	vmaxav.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8_t
> +foo2 (int8x16_t b)
>  {
> -  return vmaxavq (a, b);
> +  return vmaxavq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxav.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
> index f19707125db..cd4c813bf3b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vmaxnmaq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnma.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnma.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vmaxnmaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnma.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
> index 94fc3a2aa28..527466fc131 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vmaxnmaq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnma.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnma.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vmaxnmaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnma.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
> index b2e82f5464c..39c68cdc172 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmaq_m_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmat.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmat.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
> index 8fa7344b054..f6f8bf07549 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmaq_m_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmat.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmat.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmaq_m (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
> index 6d8cf19a341..4c1f20be036 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b)
>  {
>    return vmaxnmavq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxnmav.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b)
> +foo2 (float16x8_t b)
>  {
> -  return vmaxnmavq (a, b);
> +  return vmaxnmavq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmav.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
> index ef79030d8eb..86087335cea 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b)
>  {
>    return vmaxnmavq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxnmav.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b)
> +foo2 (float32x4_t b)
>  {
> -  return vmaxnmavq (a, b);
> +  return vmaxnmavq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmav.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
> index f7f39f59dad..a4973567d5e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmavq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
> +foo2 (float16x8_t b, mve_pred16_t p)
>  {
> -  return vmaxnmavq_p (a, b, p);
> +  return vmaxnmavq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmavt.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
> index 341f6254a5a..b229cb3a322 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmavq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmavt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
> +foo2 (float32x4_t b, mve_pred16_t p)
>  {
> -  return vmaxnmavq_p (a, b, p);
> +  return vmaxnmavq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmavt.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
> index 59a8070e07b..faf968ebb21 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vmaxnmq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnm.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnm.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vmaxnmq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnm.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
> index 5db42bd4b8c..f7ee01b1f14 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vmaxnmq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnm.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnm.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vmaxnmq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmaxnm.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
> index 4668fd03c9d..ee3444393ed 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
> index 9e8ccbc84b7..5d434432856 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
> index ecca6069d22..dad76734fd8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
> index c3965dda4f1..2fe8c0d4f3d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxnmt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
> index 80bd1d4cda1..9787cc1ba90 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float16_t a, float16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b)
>  {
>    return vmaxnmvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxnmv.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b)
> +foo2 (float16x8_t b)
>  {
> -  return vmaxnmvq (a, b);
> +  return vmaxnmvq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmv.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
> index bb2fc46f88a..b1191876850 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (float32_t a, float32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b)
>  {
>    return vmaxnmvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxnmv.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b)
> +foo2 (float32x4_t b)
>  {
> -  return vmaxnmvq (a, b);
> +  return vmaxnmvq (1.1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmv.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
> index 3efe203007b..0b1740d5ed2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float16_t a, float16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
>  foo1 (float16_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmaxnmvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16_t
> -foo2 (float32_t a, float16x8_t b, mve_pred16_t p)
> +foo2 (float16x8_t b, mve_pred16_t p)
>  {
> -  return vmaxnmvq_p (a, b, p);
> +  return vmaxnmvq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmvt.f16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
> index 6c13247f1f1..ca6ad91d24d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (float32_t a, float32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
>  foo1 (float32_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmaxnmvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxnmvt.f32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32_t
> -foo2 (float16_t a, float32x4_t b, mve_pred16_t p)
> +foo2 (float32x4_t b, mve_pred16_t p)
>  {
> -  return vmaxnmvq_p (a, b, p);
> +  return vmaxnmvq_p (1.1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxnmvt.f32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
> index 2791ed4c562..548824fc58a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
> index 27f7d5d7b16..e935729b47d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
> index 23b7569f720..8028fa031c7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
> index 61e51e3b830..e872f9e72f8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
> index 23df7eeaed6..76606555881 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
> index 138d5c87894..7ade467cafd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
> index a42fc82a852..bf547a2420d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vmaxq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
> index 14c094a5d11..25bb950c0bf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vmaxq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
> index 0540a27bae9..33057f1a58e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vmaxq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
> index 6b9b5a73bcd..7717a9a5057 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vmaxq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
> index 3112302bf1a..36b5c276cfe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vmaxq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
> index b1baa5083bd..e643e5f3e3c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmax.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vmaxq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmax.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vmaxq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmax.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
> index 9d92f2ccd85..a32feb0d7cd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
> index 200fd4b1bb1..3ac1994c4f8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
> index 2fe752558b9..c9ba33d1504 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
> index 967622e331c..954a9e2f02a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
> index 56b5d8fa8b8..022d418af84 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
> index 1816f959dd7..7e1687a8b72 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmaxt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmaxq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
> index 657efc51bea..a97703eb58c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo (int16_t a, int16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int16_t a, int16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo1 (int16_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> -int16_t
> -foo2 (int8_t a, int16x8_t b, mve_pred16_t p)
> -{
> -  return vmaxvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
> index 5882351c0fa..b4bddcb8312 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int32_t a, int32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> -int32_t
> -foo2 (int16_t a, int32x4_t b, mve_pred16_t p)
> -{
> -  return vmaxvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
> index 3737ecd3307..ee8c3e9155f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo (int8_t a, int8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,20 @@ foo (int8_t a, int8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo1 (int8_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> -int8_t
> -foo2 (int32_t a, int8x16_t b, mve_pred16_t p)
> -{
> -  return vmaxvq_p (a, b, p);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
> index 348cf39caa0..906adf85936 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
> -foo2 (uint32_t a, uint16x8_t b, mve_pred16_t p)
> +foo2 (uint16x8_t b, mve_pred16_t p)
>  {
> -  return vmaxvq_p (a, b, p);
> +  return vmaxvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.u16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
> index f2e976216c5..acc5367c5a2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo2 (uint8_t a, uint32x4_t b, mve_pred16_t p)
> +foo2 (uint32x4_t b, mve_pred16_t p)
>  {
> -  return vmaxvq_p (a, b, p);
> +  return vmaxvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.u32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
> index 7df5b63c9bc..358cb40f829 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c
> @@ -1,9 +1,20 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  {
> @@ -11,18 +22,36 @@ foo (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmaxvq_p (a, b, p);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmaxvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
> -foo2 (uint16_t a, uint8x16_t b, mve_pred16_t p)
> +foo2 (uint8x16_t b, mve_pred16_t p)
>  {
> -  return vmaxvq_p (a, b, p);
> +  return vmaxvq_p (1, b, p);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxvt.u8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
> index 8412452cf33..485355a7d72 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo (int16_t a, int16x8_t b)
>  {
> @@ -11,18 +18,16 @@ foo (int16_t a, int16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16_t
>  foo1 (int16_t a, int16x8_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> -int16_t
> -foo2 (int8_t a, int16x8_t b)
> -{
> -  return vmaxvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.s16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
> index 09f4909c9a8..3b9075689a0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b)
>  {
> @@ -11,18 +18,16 @@ foo (int32_t a, int32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> -int32_t
> -foo2 (int16_t a, int32x4_t b)
> -{
> -  return vmaxvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.s32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
> index a087bbc6b64..f13a0168d9d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_s8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo (int8_t a, int8x16_t b)
>  {
> @@ -11,18 +18,16 @@ foo (int8_t a, int8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8_t
>  foo1 (int8_t a, int8x16_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> -int8_t
> -foo2 (int32_t a, int8x16_t b)
> -{
> -  return vmaxvq (a, b);
> -}
> -
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.s8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
> index 47fe0d1cf0f..6a0fe254043 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u16.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo (uint16_t a, uint16x8_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint16_t a, uint16x8_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
>  foo1 (uint16_t a, uint16x8_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16_t
> -foo2 (uint32_t a, uint16x8_t b)
> +foo2 (uint16x8_t b)
>  {
> -  return vmaxvq (a, b);
> +  return vmaxvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.u16" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
> index aa723daf5dd..eed20046e53 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u32.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint32_t a, uint32x4_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo2 (uint8_t a, uint32x4_t b)
> +foo2 (uint32x4_t b)
>  {
> -  return vmaxvq (a, b);
> +  return vmaxvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.u32" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
> index 3aae785040c..d44a6d3bb02 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmaxvq_u8.c
> @@ -1,9 +1,16 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo (uint8_t a, uint8x16_t b)
>  {
> @@ -11,18 +18,28 @@ foo (uint8_t a, uint8x16_t b)
>  }
> 
> 
> +/*
> +**foo1:
> +**	...
> +**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
>  foo1 (uint8_t a, uint8x16_t b)
>  {
>    return vmaxvq (a, b);
>  }
> 
> -
> +/*
> +**foo2:
> +**	...
> +**	vmaxv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8_t
> -foo2 (uint16_t a, uint8x16_t b)
> +foo2 (uint8x16_t b)
>  {
> -  return vmaxvq (a, b);
> +  return vmaxvq (1, b);
>  }
> 
> -/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> -/* { dg-final { scan-assembler-times "vmaxv.u8" 3 } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 10/35] arm: improve tests for vabavq*
  2022-11-17 16:37 ` [PATCH 10/35] arm: improve tests for vabavq* Andrea Corallo
@ 2022-11-18 16:43   ` Kyrylo Tkachov
  2022-11-21 14:49     ` Andrea Corallo
  0 siblings, 1 reply; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:43 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 10/35] arm: improve tests for vabavq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vabavq_u8.c:

Missing ChangeLog text?
Ok with ChangeLog fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vabavq_p_s16.c         | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_p_s32.c         | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_p_s8.c          | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_p_u16.c         | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_p_u32.c         | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_p_u8.c          | 40 ++++++++++++++++++-
>  .../arm/mve/intrinsics/vabavq_s16.c           | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vabavq_s32.c           | 28 ++++++++++++-
>  .../gcc.target/arm/mve/intrinsics/vabavq_s8.c | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vabavq_u16.c           | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vabavq_u32.c           | 28 ++++++++++++-
>  .../gcc.target/arm/mve/intrinsics/vabavq_u8.c | 28 ++++++++++++-
>  12 files changed, 384 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> index 78ac801fa3c..843d022c418 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>    return vabavq_p_s16 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int16x8_t b, int16x8_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> index af4e30b6127..6ed9b9ac1c4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>    return vabavq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b, int32x4_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
> index a76b6bd4bda..ec34be92a28 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
>  {
>    return vabavq_p_s8 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int8x16_t b, int8x16_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
> index 9627a00b812..440b603a18e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
>  {
>    return vabavq_p_u16 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint16x8_t b, uint16x8_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
> index 298c2c38101..9500ee054b1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>    return vabavq_p_u32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
> index 775072225f8..40c9a51fbe4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
>  {
>    return vabavq_p_u8 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
>  {
>    return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabavt.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint8x16_t b, uint8x16_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
> index c2383f1865b..27684fa4a88 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int16x8_t b, int16x8_t c)
>  {
>    return vabavq_s16 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int16x8_t b, int16x8_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int16x8_t b, int16x8_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
> index 7170d013c3b..f595609a2a0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, int32x4_t c)
>  {
>    return vabavq_s32 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, int32x4_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int32x4_t b, int32x4_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
> index d75ecdbdbdf..60fa9e23b7b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, int8x16_t b, int8x16_t c)
>  {
>    return vabavq_s8 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int8x16_t b, int8x16_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.s8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (int8x16_t b, int8x16_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
> index 40ab94d9083..f3255276eda 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint16x8_t b, uint16x8_t c)
>  {
>    return vabavq_u16 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint16x8_t b, uint16x8_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint16x8_t b, uint16x8_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
> index 4b9f5c32f3d..f41fa1f3952 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b, uint32x4_t c)
>  {
>    return vabavq_u32 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b, uint32x4_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint32x4_t b, uint32x4_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
> index 3638e9d7106..3a2654435df 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint8x16_t b, uint8x16_t c)
>  {
>    return vabavq_u8 (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint8x16_t b, uint8x16_t c)
>  {
>    return vabavq (a, b, c);
>  }
> 
> -/* { dg-final { scan-assembler "vabav.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vabav.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint8x16_t b, uint8x16_t c)
> +{
> +  return vabavq (1, b, c);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 11/35] arm: improve tests for vabdq*
  2022-11-17 16:37 ` [PATCH 11/35] arm: improve tests for vabdq* Andrea Corallo
@ 2022-11-18 16:44   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:44 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 11/35] arm: improve tests for vabdq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vabdq_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vabdq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabdq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vabdq_f16.c | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_f32.c | 16 ++++++++++--
>  .../arm/mve/intrinsics/vabdq_m_f16.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_f32.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_s16.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_s32.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_s8.c           | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_u16.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_u32.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_m_u8.c           | 26 ++++++++++++++++---
>  .../gcc.target/arm/mve/intrinsics/vabdq_s16.c | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s32.c | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s8.c  | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u16.c | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u32.c | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u8.c  | 16 ++++++++++--
>  .../arm/mve/intrinsics/vabdq_x_f16.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_f32.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_s16.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_s32.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_s8.c           | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_u16.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_u32.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vabdq_x_u8.c           | 25 +++++++++++++++---
>  24 files changed, 464 insertions(+), 73 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> index b55e826e4b6..f379b25c49e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vabdq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> index f1a95b14e03..3ba808e0b4d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vabdq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
> index f92e671edec..903c6dfe861 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
> index 5e30997c997..4ddf4ee5c61 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
> index 35809895dea..c719a0b5e9c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
> index 77d97e1db63..048554144cd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
> index a0004d9f290..458b920b5cb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
> index c4dc9a469da..8e163edb153 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
> index 18a64d3a19d..619d4706dc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
> index 494f39cb857..079478df08a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
> index 73773ac9ebc..0dce4c482ac 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vabdq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
> index 3c552a2969e..f5908fe81d8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vabdq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
> index f7de6f707ac..3f249e1a622 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vabdq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
> index 90d1c873cca..16a4b930d2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vabdq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
> index 405dca51466..2b5ee12945c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vabdq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
> index 2b693c16520..50a4c162c9b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vabdq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
> index 9d771a3325f..da142f4394b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
> index 498851348d5..1ff1bef258f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
> index 1fa77cc5cae..6733e2bcc14 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
> index 24a62702482..8d7631b9ac6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
> index f96c2dfd147..90784c1d389 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
> index 820b8416330..f376374564a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
> index 2d81930348a..d9467a1ccd7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
> index 7f956850b52..1ea3713d12b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabdt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabdt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vabdq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 12/35] arm: improve tests and fix vabsq*
  2022-11-17 16:37 ` [PATCH 12/35] arm: improve tests and fix vabsq* Andrea Corallo
@ 2022-11-18 16:45   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:45 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 12/35] arm: improve tests and fix vabsq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vabsq_f<mode>): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vabsq_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vabsq_x_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f16.c | 22 +++++++++++++++-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f32.c | 22 +++++++++++++++-
>  .../arm/mve/intrinsics/vabsq_m_f16.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_m_f32.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_m_s16.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_m_s32.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_m_s8.c           | 25 ++++++++++++++++---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s16.c | 20 ++++++++++++---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s32.c | 20 ++++++++++++---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s8.c  | 16 ++++++++++--
>  .../arm/mve/intrinsics/vabsq_x_f16.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_x_f32.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_x_s16.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_x_s32.c          | 25 ++++++++++++++++---
>  .../arm/mve/intrinsics/vabsq_x_s8.c           | 25 ++++++++++++++++---
>  16 files changed, 309 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 3330a220aea..bc4e2f2ac21 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -279,7 +279,7 @@ (define_insn "mve_vabsq_f<mode>"
>  	(abs:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vabs.f%#<V_sz_elem>  %q0, %q1"
> +  "vabs.f%#<V_sz_elem>\t%q0, %q1"
>    [(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> index 08e141baedc..f29ada8c058 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabs.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a)
>  {
>    return vabsq_f16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f16"  }  } */
> +
> +/*
> +**foo1:
> +**	...
> +**	vabs.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 (float16x8_t a)
> +{
> +  return vabsq (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> index 3614a44fbdc..cc24744fb26 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabs.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a)
>  {
>    return vabsq_f32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f32"  }  } */
> +
> +/*
> +**foo1:
> +**	...
> +**	vabs.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 (float32x4_t a)
> +{
> +  return vabsq (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
> index 30c14a151af..21cf284d045 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_m_f16 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
> index 652056aa98c..236830b3a9e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_m_f32 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
> index 2dcf488bd0d..22f7b37b30b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_m_s16 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
> index 183909fef93..b3021edf52b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_m_s32 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
> index cd17974838e..da9ff2f978a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_m_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, mve_pred16_t p)
>  {
>    return vabsq_m_s8 (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, mve_pred16_t p)
>  {
>    return vabsq_m (inactive, a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
> index 243afebc38c..84906302c8a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s16.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabs.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a)
>  {
>    return vabsq_s16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabs.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a)
>  {
>    return vabsq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
> index d9843503a48..117c787d595 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s32.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabs.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a)
>  {
>    return vabsq_s32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabs.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a)
>  {
>    return vabsq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
> index 93bf1520dd3..a7f1413505c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vabs.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a)
>  {
>    return vabsq_s8 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vabs.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a)
>  {
>    return vabsq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
> index d1fc7002ccb..f24a8cccb53 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_x_f16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_x (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
> index 0beccac030d..fd4c2277969 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_f32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_x_f32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.f32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_x (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
> index fd67fd5ccac..0e1d1bb94d4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_x_s16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s16	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, mve_pred16_t p)
>  {
>    return vabsq_x (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
> index 22d561d1e46..64d0e4b574d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_x_s32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s32	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, mve_pred16_t p)
>  {
>    return vabsq_x (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
> index 6908a6ca20c..742bc701fae 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, mve_pred16_t p)
>  {
>    return vabsq_x_s8 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vabst.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vabst.s8	q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, mve_pred16_t p)
>  {
>    return vabsq_x (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic
  2022-11-17 16:37 ` [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic Andrea Corallo
@ 2022-11-18 16:49   ` Kyrylo Tkachov
  2022-11-21 10:45     ` Stam Markianos-Wright
  0 siblings, 1 reply; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:49 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Stam Markianos-Wright



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
> Wright@arm.com>
> Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
> intrinsic
> 
> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> 
> It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
> resolution would fall back to the `__ARM_undef` failure state.
> 
> This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
> and
> `6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
> identified, because the tests were not checking for this kind of failure.
> 
> The above commits changed the definitions of the intrinsics from using
> `[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
> allowed `int` to be supported in user code through the overloaded
> `#defines`, but seems to have broken the `[u]int[8/16/32]_t` types
> 
> The solution implemented by this patch is to explicitly use a new
> _Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
> change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
> user code and are handled by the overloading mechanism correctly.
> 
> gcc/ChangeLog:
> 
>         * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
>         (__arm_vaddq_m_n_s32): Likewise.
>         (__arm_vaddq_m_n_s16): Likewise.
>         (__arm_vaddq_m_n_u8): Likewise.
>         (__arm_vaddq_m_n_u32): Likewise.
>         (__arm_vaddq_m_n_u16): Likewise.
>         (__arm_vaddq_m): Fix Overloading.
>         (__ARM_mve_coerce3): New.

Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit message?
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 78 ++++++++++++++++++++--------------------
>  1 file changed, 40 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 684f997520f..951dc25374b 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
>  {
>    return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
>  }
> @@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_s32 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_s16 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_u8 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_u32 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_u16 (__inactive, __a, __b, __p);
>  }
> @@ -35657,6 +35657,8 @@ extern void *__ARM_undef;
>      _Generic(param, type: param, const type: param, default: *(type
> *)__ARM_undef)
>  #define __ARM_mve_coerce2(param, type) \
>      _Generic(param, type: param, float16_t: param, float32_t: param, default:
> *(type *)__ARM_undef)
> +#define __ARM_mve_coerce3(param, type) \
> +    _Generic(param, type: param, int8_t: param, int16_t: param, int32_t:
> param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param,
> uint64_t: param, default: *(type *)__ARM_undef)
> 
>  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> 
> @@ -35871,14 +35873,14 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t),
> __ARM_mve_coerce(p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t),
> __ARM_mve_coerce(p1, float32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vaddq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vaddq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> 
> @@ -37316,12 +37318,12 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vaddq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vaddq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vaddq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> 
> @@ -38820,12 +38822,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int)));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39641,12 +39643,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters
  2022-11-17 16:37 ` [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters Andrea Corallo
@ 2022-11-18 16:51   ` Kyrylo Tkachov
  2022-11-21 10:46     ` Stam Markianos-Wright
  0 siblings, 1 reply; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:51 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Stam Markianos-Wright



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
> Wright@arm.com>
> Subject: [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic
> scalar parameters
> 
> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> 
> This is a mechanical patch that propagates the change proposed in
> my previous patch for vaddq[_m]_n
> across all other polymorphic MVE intrinsic overloads of scalar types.
> 
> The find and Replace patterns used were:
> 
> s/__ARM_mve_coerce\(__p(\d+), [u]?int(8|16|32|64)_t\)
> /__ARM_mve_coerce3(p$1, int)/g
> 
> s/__ARM_mve_coerce2\(__p(\d+), double\)
> /__ARM_mve_coerce2(p$1, double)/g
> 
> gcc/ChangeLog:
> 
>         * config/arm/arm_mve.h (__arm_vaddq): Fix Overloading.
>         (__arm_vmulq): Likewise.
>         (__arm_vcmpeqq): Likewise.
>         (__arm_vcmpneq): Likewise.
>         (__arm_vmaxnmavq): Likewise.
>         (__arm_vmaxnmvq): Likewise.
>         (__arm_vminnmavq): Likewise.
>         (__arm_vsubq): Likewise.
>         (__arm_vminnmvq): Likewise.
>         (__arm_vrshlq): Likewise.
>         (__arm_vqsubq): Likewise.
>         (__arm_vqdmulltq): Likewise.
>         (__arm_vqdmullbq): Likewise.
>         (__arm_vqdmulhq): Likewise.
>         (__arm_vqaddq): Likewise.
>         (__arm_vhaddq): Likewise.
>         (__arm_vhsubq): Likewise.
>         (__arm_vqdmlashq): Likewise.
>         (__arm_vqrdmlahq): Likewise.
>         (__arm_vmlasq): Likewise.
>         (__arm_vqdmlahq): Likewise.
>         (__arm_vmaxnmavq_p): Likewise.
>         (__arm_vmaxnmvq_p): Likewise.
>         (__arm_vminnmavq_p): Likewise.
>         (__arm_vminnmvq_p): Likewise.
>         (__arm_vfmasq_m): Likewise.
>         (__arm_vsetq_lane): Likewise.
>         (__arm_vcmpneq_m): Likewise.
>         (__arm_vhaddq_x): Likewise.
>         (__arm_vhsubq_x): Likewise.
>         (__arm_vqrdmlashq_m): Likewise.
>         (__arm_vqdmlashq_m): Likewise.
>         (__arm_vmlaldavaxq_p): Likewise.
>         (__arm_vmlasq_m): Likewise.
>         (__arm_vqdmulhq_m): Likewise.
>         (__arm_vqdmulltq_m): Likewise.
>         (__arm_viwdupq_m): Likewise.
>         (__arm_viwdupq_u16): Likewise.
>         (__arm_viwdupq_u32): Likewise.
>         (__arm_viwdupq_u8): Likewise.
>         (__arm_vdwdupq_m): Likewise.
>         (__arm_vdwdupq_u16): Likewise.
>         (__arm_vdwdupq_u32): Likewise.
>         (__arm_vdwdupq_u8): Likewise.
>         (__arm_vaddlvaq): Likewise.
>         (__arm_vaddlvaq_p): Likewise.
>         (__arm_vaddvaq): Likewise.
>         (__arm_vaddvaq_p): Likewise.
>         (__arm_vcmphiq_m): Likewise.
>         (__arm_vmladavaq_p): Likewise.
>         (__arm_vmladavaxq): Likewise.
>         (__arm_vmlaldavaxq): Likewise.
>         (__arm_vrmlaldavhaq_p): Likewise.

IMO this should have been squashed with the previous patch.
Is all this covered by the tests that we have (or that you're improving in this series)?
Ok if so.
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 1106 +++++++++++++++++++-------------------
>  1 file changed, 553 insertions(+), 553 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 951dc25374b..fd1876b57a0 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35881,8 +35881,8 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)));})
> 
>  #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -35927,14 +35927,14 @@ extern void *__ARM_undef;
>  #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -35959,14 +35959,14 @@ extern void *__ARM_undef;
>  #define __arm_vcmpeqq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpeqq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpeqq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpeqq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -35997,16 +35997,16 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpeqq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpeqq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2));})
> 
>  #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36014,13 +36014,13 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpgtq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpgtq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)));})
> 
>  #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36030,11 +36030,11 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpleq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpleq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)));})
> 
>  #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36042,25 +36042,25 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpltq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpltq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)));})
> 
>  #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36115,8 +36115,8 @@ extern void *__ARM_undef;
>  #define __arm_vmaxnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
>  #define __arm_vmaxnmq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36127,14 +36127,14 @@ extern void *__ARM_undef;
>  #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
>  #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
>  #define __arm_vminnmaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36145,8 +36145,8 @@ extern void *__ARM_undef;
>  #define __arm_vminnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmavq_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmavq_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmavq_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmavq_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
>  #define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -36168,14 +36168,14 @@ extern void *__ARM_undef;
>  #define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36188,8 +36188,8 @@ extern void *__ARM_undef;
>  #define __arm_vminnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmvq_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmvq_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmvq_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmvq_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
>  #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -36244,12 +36244,12 @@ extern void *__ARM_undef;
>  #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36280,12 +36280,12 @@ extern void *__ARM_undef;
>  #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36336,12 +36336,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36349,9 +36349,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vmlaldavxq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36384,8 +36384,8 @@ extern void *__ARM_undef;
>  #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> @@ -36398,17 +36398,17 @@ extern void *__ARM_undef;
>  #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
>  #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> @@ -36416,12 +36416,12 @@ extern void *__ARM_undef;
>  #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36454,12 +36454,12 @@ extern void *__ARM_undef;
>  #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36484,12 +36484,12 @@ extern void *__ARM_undef;
>  #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -36632,12 +36632,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)),
> \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t)), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int)));})
> 
>  #define __arm_vsriq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36716,44 +36716,44 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -	    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +	    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -	    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +	    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t)), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t)), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int)));})
> 
>  #define __arm_vqdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36943,11 +36943,11 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpgtq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpgtq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
> @@ -36959,11 +36959,11 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpleq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpleq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2));})
> 
>  #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36973,11 +36973,11 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpltq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpltq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2));})
> 
>  #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -36990,14 +36990,14 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpneq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpneq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2));})
> 
>  #define __arm_vcvtbq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37051,8 +37051,8 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double)), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double)), \
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double)), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double)), \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vfmaq_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t)), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vfmaq_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t)));})
> 
> @@ -37067,8 +37067,8 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double)), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double)));})
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double)), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double)));})
> 
>  #define __arm_vmaxnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37091,14 +37091,14 @@ extern void *__ARM_undef;
>  #define __arm_vmaxnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
>  #define __arm_vmaxnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
>  #define __arm_vminnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37109,14 +37109,14 @@ extern void *__ARM_undef;
>  #define __arm_vminnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
>  #define __arm_vminnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
>  #define __arm_vrndnq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37178,13 +37178,13 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpgeq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpgeq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double)));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double)));})
> 
>  #define __arm_vrshrnbq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37285,11 +37285,11 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(__p1, double), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce2(p1, double), p2), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce2(p1, double), p2), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmpgeq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmpgeq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
> @@ -37324,8 +37324,8 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37466,15 +37466,15 @@ extern void *__ARM_undef;
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vfmaq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vfmaq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vfmasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vfmsq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37509,14 +37509,14 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vmulq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vmulq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vmulq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -37543,14 +37543,14 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vsubq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vsubq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38023,19 +38023,19 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vaddq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vaddq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -38158,19 +38158,19 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vmulq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vmulq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
> @@ -38258,8 +38258,8 @@ extern void *__ARM_undef;
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(__p2, double), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(__p2, double), p3));})
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce2(p2, double), p3));})
> 
>  #define __arm_vcmulq_rot90_x(p1,p2,p3)  ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -38283,16 +38283,16 @@ extern void *__ARM_undef;
>  #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
> __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int64x2_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
> __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t),
> __ARM_mve_coerce(__p1, uint64x2_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vsetq_lane_f16 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vsetq_lane_f32 (__ARM_mve_coerce2(__p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
> __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int64x2_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
> __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint64x2_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
> __arm_vsetq_lane_f16 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float16x8_t), p2), \
> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
> __arm_vsetq_lane_f32 (__ARM_mve_coerce2(p0, double),
> __ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
>  #else /* MVE Integer.  */
> 
> @@ -38410,12 +38410,12 @@ extern void *__ARM_undef;
>  #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38442,12 +38442,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -38461,12 +38461,12 @@ extern void *__ARM_undef;
>  #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38497,12 +38497,12 @@ extern void *__ARM_undef;
>  #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38571,12 +38571,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38584,16 +38584,16 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> @@ -38601,12 +38601,12 @@ extern void *__ARM_undef;
>  #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38637,12 +38637,12 @@ extern void *__ARM_undef;
>  #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38717,12 +38717,12 @@ extern void *__ARM_undef;
>  #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38747,12 +38747,12 @@ extern void *__ARM_undef;
>  #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> @@ -38858,12 +38858,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpeqq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpeqq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpeqq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vqmovntq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38944,16 +38944,16 @@ extern void *__ARM_undef;
>  #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
>  #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> @@ -38963,9 +38963,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38973,9 +38973,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38983,9 +38983,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpleq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpleq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -38993,20 +38993,20 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t)));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpneq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpneq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpneq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> @@ -39031,12 +39031,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vbicq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -39146,25 +39146,25 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39227,9 +39227,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
> 
>  #define __arm_vcmpgtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> @@ -39238,9 +39238,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vcmpleq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39248,9 +39248,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpleq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpleq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39258,9 +39258,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vcmpltq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vcmpltq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39271,12 +39271,12 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpneq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8_t), p2), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16_t), p2), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2));})
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vdupq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39299,23 +39299,23 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)),
> \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t)), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int)));})
> 
>  #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t)), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t)), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int)));})
> 
>  #define __arm_vnegq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39340,9 +39340,9 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t)), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t)));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int)), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int)));})
> 
>  #define __arm_vqdmlsdhq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39505,12 +39505,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> @@ -39610,12 +39610,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t), p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> 
>  #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -39660,12 +39660,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vmulq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vmulq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vmulq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> @@ -40002,15 +40002,15 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> 
>  #define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -40104,15 +40104,15 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> 
>  #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
> @@ -40234,14 +40234,14 @@ extern void *__ARM_undef;
>  #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
> __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int64x2_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
> __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t),
> __ARM_mve_coerce(__p1, uint64x2_t), p2));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
> __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int64x2_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
> __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint64x2_t), p2));})
> 
>  #endif /* MVE Integer.  */
> 
> @@ -40421,12 +40421,12 @@ extern void *__ARM_undef;
>  #define __arm_vhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> @@ -40451,12 +40451,12 @@ extern void *__ARM_undef;
>  #define __arm_vhsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> @@ -40576,25 +40576,25 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqrdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqrshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -40695,12 +40695,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqsubq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqsubq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqsubq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> @@ -40715,9 +40715,9 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqrdmulhq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqrdmulhq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqrdmulhq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqrdmlsdhxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -40843,17 +40843,17 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> 
>  #define __arm_vmlaldavaxq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce(__p0,
> int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce(__p0,
> int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3));})
> 
>  #define __arm_vmlsldavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -40992,12 +40992,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vhaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vhaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vhaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> @@ -41031,12 +41031,12 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vhsubq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vhsubq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vhsubq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> 
>  #define __arm_vmaxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41064,23 +41064,23 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> 
>  #define __arm_vmlasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> 
>  #define __arm_vmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41126,12 +41126,12 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> @@ -41143,17 +41143,17 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
>    int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqdmulhq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqdmulhq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqdmulhq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> @@ -41164,15 +41164,15 @@ extern void *__ARM_undef;
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqdmullbq_m_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqdmullbq_m_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3));})
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
>  #define __arm_vqdmulltq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32_t), p3), \
> +  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> +  int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqdmulltq_m_s16 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqdmulltq_m_s32 (__ARM_mve_coerce(__p0,
> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> 
> @@ -41238,9 +41238,9 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce(__p0,
> int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce(__p0,
> int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3));})
> 
>  #define __arm_vmullbq_poly_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41311,51 +41311,51 @@ extern void *__ARM_undef;
>  #define __arm_viwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_viwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_viwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_viwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
> 
>  #define __arm_viwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16
> (__ARM_mve_coerce(__p0, uint32_t), p1, (const int) p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16
> (__ARM_mve_coerce3(p0, int), p1, (const int) p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u16
> (__ARM_mve_coerce(__p0, uint32_t *), p1, (const int) p2));})
> 
>  #define __arm_viwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32
> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32
> (__ARM_mve_coerce3(p0, int), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u32
> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
> 
>  #define __arm_viwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8
> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8
> (__ARM_mve_coerce3(p0, int), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u8
> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
> 
>  #define __arm_vdwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_vdwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_vdwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]:
> __arm_vdwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
> 
>  #define __arm_vdwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16
> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16
> (__ARM_mve_coerce3(p0, int), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u16
> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
> 
>  #define __arm_vdwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32
> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32
> (__ARM_mve_coerce3(p0, int), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u32
> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
> 
>  #define __arm_vdwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8
> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8
> (__ARM_mve_coerce3(p0, int), p1, p2), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u8
> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
> 
>  #define __arm_vshlcq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> @@ -41392,14 +41392,14 @@ extern void *__ARM_undef;
>  #define __arm_vaddlvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddlvaq_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddlvaq_u32 (__ARM_mve_coerce(__p0, uint64_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddlvaq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddlvaq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
>  #define __arm_vaddlvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddlvaq_p_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddlvaq_p_u32 (__ARM_mve_coerce(__p0, uint64_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddlvaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddlvaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
> 
>  #define __arm_vaddlvq(p0) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -41414,22 +41414,22 @@ extern void *__ARM_undef;
>  #define __arm_vaddvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vaddvaq_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vaddvaq_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddvaq_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vaddvaq_u8 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vaddvaq_u16 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddvaq_u32 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vaddvaq_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vaddvaq_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddvaq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vaddvaq_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vaddvaq_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddvaq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
>  #define __arm_vaddvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vaddvaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vaddvaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddvaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vaddvaq_p_u8 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vaddvaq_p_u16 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddvaq_p_u32 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
> __arm_vaddvaq_p_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
> __arm_vaddvaq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
> __arm_vaddvaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
> __arm_vaddvaq_p_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
> __arm_vaddvaq_p_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
> __arm_vaddvaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
> 
>  #define __arm_vaddvq(p0) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> @@ -41455,9 +41455,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpcsq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpcsq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpcsq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmpcsq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41465,9 +41465,9 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmpcsq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmpcsq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmpcsq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2));})
> 
>  #define __arm_vcmphiq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41475,16 +41475,16 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmphiq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmphiq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmphiq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t)));})
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
>  #define __arm_vcmphiq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8_t), p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16_t), p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32_t), p2), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int), p2), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int), p2), \
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vcmphiq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcmphiq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcmphiq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
> @@ -41581,34 +41581,34 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce(__p0, uint32_t),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> 
>  #define __arm_vmladavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t), p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t), p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> 
>  #define __arm_vmladavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce(__p0, int32_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce(__p0,
> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
> e_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
> pe_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
> uint8x16_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> 
>  #define __arm_vmladavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41651,17 +41651,17 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
> pe_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
> uint16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> 
>  #define __arm_vmlaldavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
> e_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)));})
> 
>  #define __arm_vmlaldavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
> @@ -41856,15 +41856,15 @@ extern void *__ARM_undef;
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce(__p0, int64_t),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t)), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t)));})
> 
>  #define __arm_vrmlaldavhaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce(__p0,
> int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce(__p0,
> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
> e_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
> p3), \
> +  int
> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
> pe_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce3(p0, int),
> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
> uint32x4_t), p3));})
> 
>  #define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0);
> \
>    __typeof(p1) __p1 = (p1); \
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
  2022-11-17 16:37 ` [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] Andrea Corallo
@ 2022-11-18 16:58   ` Kyrylo Tkachov
  2022-11-20 22:49     ` Ramana Radhakrishnan
  2022-11-21 10:45     ` Stam Markianos-Wright
  0 siblings, 2 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-18 16:58 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Stam Markianos-Wright



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
> Wright@arm.com>
> Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
> overloading [PR107515]
> 
> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> 
> This patch adds explicit references to other float types
> to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> 
> gcc/ChangeLog:
>         PR 107515
>         * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but for now ok.
The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC.
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index fd1876b57a0..f6b42dc3fab 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35582,6 +35582,9 @@ enum {
>  	short: __ARM_mve_type_int_n, \
>  	int: __ARM_mve_type_int_n, \
>  	long: __ARM_mve_type_int_n, \
> +	_Float16: __ARM_mve_type_fp_n, \
> +	__fp16: __ARM_mve_type_fp_n, \
> +	float: __ARM_mve_type_fp_n, \
>  	double: __ARM_mve_type_fp_n, \
>  	long long: __ARM_mve_type_int_n, \
>  	unsigned char: __ARM_mve_type_int_n, \
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
  2022-11-18 16:58   ` Kyrylo Tkachov
@ 2022-11-20 22:49     ` Ramana Radhakrishnan
  2022-11-21 14:11       ` Stam Markianos-Wright
  2022-11-21 10:45     ` Stam Markianos-Wright
  1 sibling, 1 reply; 82+ messages in thread
From: Ramana Radhakrishnan @ 2022-11-20 22:49 UTC (permalink / raw)
  To: Kyrylo Tkachov
  Cc: Andrea Corallo, gcc-patches, Richard Earnshaw, Stam Markianos-Wright

On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> > -----Original Message-----
> > From: Andrea Corallo <andrea.corallo@arm.com>
> > Sent: Thursday, November 17, 2022 4:38 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
> > Wright@arm.com>
> > Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
> > overloading [PR107515]
> >
> > From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >
> > This patch adds explicit references to other float types
> > to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> >
> > gcc/ChangeLog:
> >         PR 107515
> >         * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
>
> Argh, I'm looking forward to when we move away from this _Generic business, but for now ok.
> The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC.

and the PR is against 11.x - is there a plan to back port this and
dependent patches to relevant branches ?

Ramana

> Thanks,
> Kyrill
>
> > ---
> >  gcc/config/arm/arm_mve.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > index fd1876b57a0..f6b42dc3fab 100644
> > --- a/gcc/config/arm/arm_mve.h
> > +++ b/gcc/config/arm/arm_mve.h
> > @@ -35582,6 +35582,9 @@ enum {
> >       short: __ARM_mve_type_int_n, \
> >       int: __ARM_mve_type_int_n, \
> >       long: __ARM_mve_type_int_n, \
> > +     _Float16: __ARM_mve_type_fp_n, \
> > +     __fp16: __ARM_mve_type_fp_n, \
> > +     float: __ARM_mve_type_fp_n, \
> >       double: __ARM_mve_type_fp_n, \
> >       long long: __ARM_mve_type_int_n, \
> >       unsigned char: __ARM_mve_type_int_n, \
> > --
> > 2.25.1
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic
  2022-11-18 16:49   ` Kyrylo Tkachov
@ 2022-11-21 10:45     ` Stam Markianos-Wright
  0 siblings, 0 replies; 82+ messages in thread
From: Stam Markianos-Wright @ 2022-11-21 10:45 UTC (permalink / raw)
  To: Kyrylo Tkachov, Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw


On 11/18/22 16:49, Kyrylo Tkachov wrote:
>
>> -----Original Message-----
>> From: Andrea Corallo <andrea.corallo@arm.com>
>> Sent: Thursday, November 17, 2022 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
>> Wright@arm.com>
>> Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
>> intrinsic
>>
>> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>> It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
>> resolution would fall back to the `__ARM_undef` failure state.
>>
>> This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
>> and
>> `6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
>> identified, because the tests were not checking for this kind of failure.
>>
>> The above commits changed the definitions of the intrinsics from using
>> `[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
>> allowed `int` to be supported in user code through the overloaded
>> `#defines`, but seems to have broken the `[u]int[8/16/32]_t` types
>>
>> The solution implemented by this patch is to explicitly use a new
>> _Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
>> change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
>> user code and are handled by the overloading mechanism correctly.
>>
>> gcc/ChangeLog:
>>
>>          * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
>>          (__arm_vaddq_m_n_s32): Likewise.
>>          (__arm_vaddq_m_n_s16): Likewise.
>>          (__arm_vaddq_m_n_u8): Likewise.
>>          (__arm_vaddq_m_n_u32): Likewise.
>>          (__arm_vaddq_m_n_u16): Likewise.
>>          (__arm_vaddq_m): Fix Overloading.
>>          (__ARM_mve_coerce3): New.
> Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit message?
> Thanks,
> Kyrill

Thanks for the review! Ah yes, there was this one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96795

which was closed last time around.
It does make sense to add it, though, so we'll do that.

Thanks!

>
>> ---
>>   gcc/config/arm/arm_mve.h | 78 ++++++++++++++++++++--------------------
>>   1 file changed, 40 insertions(+), 38 deletions(-)
>>
>> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
>> index 684f997520f..951dc25374b 100644
>> --- a/gcc/config/arm/arm_mve.h
>> +++ b/gcc/config/arm/arm_mve.h
>> @@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
>> uint16x8_t __a, uint16x8_t __b, mve_pr
>>
>>   __extension__ extern __inline int8x16_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
>> mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline int32x4_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
>> mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline int16x8_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
>> mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint8x16_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
>> mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint32x4_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
>> __b, mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint16x8_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
>> __b, mve_pred16_t __p)
>>   {
>>     return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
>>   }
>> @@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
>> uint16x8_t __a, uint16x8_t __b, mve_pred16
>>
>>   __extension__ extern __inline int8x16_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline int32x4_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_s32 (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline int16x8_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_s16 (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint8x16_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_u8 (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint32x4_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_u32 (__inactive, __a, __b, __p);
>>   }
>>
>>   __extension__ extern __inline uint16x8_t
>>   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> -__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, int __b,
>> mve_pred16_t __p)
>> +__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
>> mve_pred16_t __p)
>>   {
>>    return __arm_vaddq_m_n_u16 (__inactive, __a, __b, __p);
>>   }
>> @@ -35657,6 +35657,8 @@ extern void *__ARM_undef;
>>       _Generic(param, type: param, const type: param, default: *(type
>> *)__ARM_undef)
>>   #define __ARM_mve_coerce2(param, type) \
>>       _Generic(param, type: param, float16_t: param, float32_t: param, default:
>> *(type *)__ARM_undef)
>> +#define __ARM_mve_coerce3(param, type) \
>> +    _Generic(param, type: param, int8_t: param, int16_t: param, int32_t:
>> param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param,
>> uint64_t: param, default: *(type *)__ARM_undef)
>>
>>   #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
>>
>> @@ -35871,14 +35873,14 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t),
>> __ARM_mve_coerce(p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t),
>> __ARM_mve_coerce(p1, float32x4_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vaddq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vaddq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>>
>> @@ -37316,12 +37318,12 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_uint32x4_t]: __arm_vaddq_m_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_float16x8_t]: __arm_vaddq_m_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_float32x4_t]: __arm_vaddq_m_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>>
>> @@ -38820,12 +38822,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int)));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39641,12 +39643,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vaddq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vaddq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vaddq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> --
>> 2.25.1

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
  2022-11-18 16:58   ` Kyrylo Tkachov
  2022-11-20 22:49     ` Ramana Radhakrishnan
@ 2022-11-21 10:45     ` Stam Markianos-Wright
  1 sibling, 0 replies; 82+ messages in thread
From: Stam Markianos-Wright @ 2022-11-21 10:45 UTC (permalink / raw)
  To: Kyrylo Tkachov, Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw


On 11/18/22 16:58, Kyrylo Tkachov wrote:
>
>> -----Original Message-----
>> From: Andrea Corallo <andrea.corallo@arm.com>
>> Sent: Thursday, November 17, 2022 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
>> Wright@arm.com>
>> Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
>> overloading [PR107515]
>>
>> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>> This patch adds explicit references to other float types
>> to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
>>
>> gcc/ChangeLog:
>>          PR 107515
>>          * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
> Argh, I'm looking forward to when we move away from this _Generic business, but for now ok.
Oh we all are ;)
> The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC.

Agh, thanks for spotting this! Will change and push it with the rest of 
the patch series when ready/

Thank you,

Stam


> Thanks,
> Kyrill
>
>> ---
>>   gcc/config/arm/arm_mve.h | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
>> index fd1876b57a0..f6b42dc3fab 100644
>> --- a/gcc/config/arm/arm_mve.h
>> +++ b/gcc/config/arm/arm_mve.h
>> @@ -35582,6 +35582,9 @@ enum {
>>        short: __ARM_mve_type_int_n, \
>>        int: __ARM_mve_type_int_n, \
>>        long: __ARM_mve_type_int_n, \
>> +     _Float16: __ARM_mve_type_fp_n, \
>> +     __fp16: __ARM_mve_type_fp_n, \
>> +     float: __ARM_mve_type_fp_n, \
>>        double: __ARM_mve_type_fp_n, \
>>        long long: __ARM_mve_type_int_n, \
>>        unsigned char: __ARM_mve_type_int_n, \
>> --
>> 2.25.1

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters
  2022-11-18 16:51   ` Kyrylo Tkachov
@ 2022-11-21 10:46     ` Stam Markianos-Wright
  0 siblings, 0 replies; 82+ messages in thread
From: Stam Markianos-Wright @ 2022-11-21 10:46 UTC (permalink / raw)
  To: Kyrylo Tkachov, Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw


On 11/18/22 16:51, Kyrylo Tkachov wrote:
>
>> -----Original Message-----
>> From: Andrea Corallo <andrea.corallo@arm.com>
>> Sent: Thursday, November 17, 2022 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
>> Wright@arm.com>
>> Subject: [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic
>> scalar parameters
>>
>> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>> This is a mechanical patch that propagates the change proposed in
>> my previous patch for vaddq[_m]_n
>> across all other polymorphic MVE intrinsic overloads of scalar types.
>>
>> The find and Replace patterns used were:
>>
>> s/__ARM_mve_coerce\(__p(\d+), [u]?int(8|16|32|64)_t\)
>> /__ARM_mve_coerce3(p$1, int)/g
>>
>> s/__ARM_mve_coerce2\(__p(\d+), double\)
>> /__ARM_mve_coerce2(p$1, double)/g
>>
>> gcc/ChangeLog:
>>
>>          * config/arm/arm_mve.h (__arm_vaddq): Fix Overloading.
>>          (__arm_vmulq): Likewise.
>>          (__arm_vcmpeqq): Likewise.
>>          (__arm_vcmpneq): Likewise.
>>          (__arm_vmaxnmavq): Likewise.
>>          (__arm_vmaxnmvq): Likewise.
>>          (__arm_vminnmavq): Likewise.
>>          (__arm_vsubq): Likewise.
>>          (__arm_vminnmvq): Likewise.
>>          (__arm_vrshlq): Likewise.
>>          (__arm_vqsubq): Likewise.
>>          (__arm_vqdmulltq): Likewise.
>>          (__arm_vqdmullbq): Likewise.
>>          (__arm_vqdmulhq): Likewise.
>>          (__arm_vqaddq): Likewise.
>>          (__arm_vhaddq): Likewise.
>>          (__arm_vhsubq): Likewise.
>>          (__arm_vqdmlashq): Likewise.
>>          (__arm_vqrdmlahq): Likewise.
>>          (__arm_vmlasq): Likewise.
>>          (__arm_vqdmlahq): Likewise.
>>          (__arm_vmaxnmavq_p): Likewise.
>>          (__arm_vmaxnmvq_p): Likewise.
>>          (__arm_vminnmavq_p): Likewise.
>>          (__arm_vminnmvq_p): Likewise.
>>          (__arm_vfmasq_m): Likewise.
>>          (__arm_vsetq_lane): Likewise.
>>          (__arm_vcmpneq_m): Likewise.
>>          (__arm_vhaddq_x): Likewise.
>>          (__arm_vhsubq_x): Likewise.
>>          (__arm_vqrdmlashq_m): Likewise.
>>          (__arm_vqdmlashq_m): Likewise.
>>          (__arm_vmlaldavaxq_p): Likewise.
>>          (__arm_vmlasq_m): Likewise.
>>          (__arm_vqdmulhq_m): Likewise.
>>          (__arm_vqdmulltq_m): Likewise.
>>          (__arm_viwdupq_m): Likewise.
>>          (__arm_viwdupq_u16): Likewise.
>>          (__arm_viwdupq_u32): Likewise.
>>          (__arm_viwdupq_u8): Likewise.
>>          (__arm_vdwdupq_m): Likewise.
>>          (__arm_vdwdupq_u16): Likewise.
>>          (__arm_vdwdupq_u32): Likewise.
>>          (__arm_vdwdupq_u8): Likewise.
>>          (__arm_vaddlvaq): Likewise.
>>          (__arm_vaddlvaq_p): Likewise.
>>          (__arm_vaddvaq): Likewise.
>>          (__arm_vaddvaq_p): Likewise.
>>          (__arm_vcmphiq_m): Likewise.
>>          (__arm_vmladavaq_p): Likewise.
>>          (__arm_vmladavaxq): Likewise.
>>          (__arm_vmlaldavaxq): Likewise.
>>          (__arm_vrmlaldavhaq_p): Likewise.
> IMO this should have been squashed with the previous patch.
> Is all this covered by the tests that we have (or that you're improving in this series)?

Thanks for the review! Yes, I just kept them separate because the last 
one was done manually as a partial revert of what was previously done, 
whereas the other was all a mechanical find n replace -- but the end 
result they both get to is the same, tbh...
Also, yep, this batch of tests fully covers the intrinsics that were 
changed with this patch (and the vaddq patch) and we've also added 
testcases that take immediate `1` for the `_n` variants.


> Ok if so.
> Thanks,
> Kyrill
>
>> ---
>>   gcc/config/arm/arm_mve.h | 1106 +++++++++++++++++++-------------------
>>   1 file changed, 553 insertions(+), 553 deletions(-)
>>
>> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
>> index 951dc25374b..fd1876b57a0 100644
>> --- a/gcc/config/arm/arm_mve.h
>> +++ b/gcc/config/arm/arm_mve.h
>> @@ -35881,8 +35881,8 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)));})
>>
>>   #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -35927,14 +35927,14 @@ extern void *__ARM_undef;
>>   #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -35959,14 +35959,14 @@ extern void *__ARM_undef;
>>   #define __arm_vcmpeqq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpeqq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpeqq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpeqq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -35997,16 +35997,16 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpeqq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpeqq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2));})
>>
>>   #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36014,13 +36014,13 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpgtq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpgtq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)));})
>>
>>   #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36030,11 +36030,11 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpleq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpleq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)));})
>>
>>   #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36042,25 +36042,25 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpltq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpltq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)));})
>>
>>   #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36115,8 +36115,8 @@ extern void *__ARM_undef;
>>   #define __arm_vmaxnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmavq_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmavq_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>>
>>   #define __arm_vmaxnmq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36127,14 +36127,14 @@ extern void *__ARM_undef;
>>   #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>>
>>   #define __arm_vmaxnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>>
>>   #define __arm_vminnmaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36145,8 +36145,8 @@ extern void *__ARM_undef;
>>   #define __arm_vminnmavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmavq_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmavq_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmavq_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmavq_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>>
>>   #define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -36168,14 +36168,14 @@ extern void *__ARM_undef;
>>   #define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36188,8 +36188,8 @@ extern void *__ARM_undef;
>>   #define __arm_vminnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmvq_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmvq_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmvq_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmvq_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t)));})
>>
>>   #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -36244,12 +36244,12 @@ extern void *__ARM_undef;
>>   #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36280,12 +36280,12 @@ extern void *__ARM_undef;
>>   #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36336,12 +36336,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36349,9 +36349,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vmlaldavxq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36384,8 +36384,8 @@ extern void *__ARM_undef;
>>   #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>>
>> @@ -36398,17 +36398,17 @@ extern void *__ARM_undef;
>>   #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>>
>>   #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>> @@ -36416,12 +36416,12 @@ extern void *__ARM_undef;
>>   #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36454,12 +36454,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36484,12 +36484,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -36632,12 +36632,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)),
>> \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int)));})
>>
>>   #define __arm_vsriq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36716,44 +36716,44 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -         int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +         int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -         int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +         int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int)));})
>>
>>   #define __arm_vqdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36943,11 +36943,11 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgtq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpgtq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpgtq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>> @@ -36959,11 +36959,11 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpleq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpleq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpleq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2));})
>>
>>   #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36973,11 +36973,11 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpltq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpltq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpltq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2));})
>>
>>   #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -36990,14 +36990,14 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpneq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpneq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpneq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2));})
>>
>>   #define __arm_vcvtbq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37051,8 +37051,8 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double)), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double)), \
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double)), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double)), \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_float16x8_t]: __arm_vfmaq_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t)), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_float32x4_t]: __arm_vfmaq_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t)));})
>>
>> @@ -37067,8 +37067,8 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double)), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double)));})
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double)), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double)));})
>>
>>   #define __arm_vmaxnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37091,14 +37091,14 @@ extern void *__ARM_undef;
>>   #define __arm_vmaxnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmavq_p_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmavq_p_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>>   #define __arm_vmaxnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vmaxnmvq_p_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vmaxnmvq_p_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>>   #define __arm_vminnmaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37109,14 +37109,14 @@ extern void *__ARM_undef;
>>   #define __arm_vminnmavq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmavq_p_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmavq_p_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>>   #define __arm_vminnmvq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vminnmvq_p_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vminnmvq_p_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>>   #define __arm_vrndnq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37178,13 +37178,13 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpgeq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t)), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpgeq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t)), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double)), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double)));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double)), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double)));})
>>
>>   #define __arm_vrshrnbq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37285,11 +37285,11 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(__p1, double), p2), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vcmpgeq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce2(p1, double), p2), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vcmpgeq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vcmpgeq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>> @@ -37324,8 +37324,8 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37466,15 +37466,15 @@ extern void *__ARM_undef;
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_float16x8_t]: __arm_vfmaq_m_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_float32x4_t]: __arm_vfmaq_m_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmaq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vfmasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vfmasq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vfmsq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37509,14 +37509,14 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_uint32x4_t]: __arm_vmulq_m_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_float16x8_t]: __arm_vmulq_m_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_float32x4_t]: __arm_vmulq_m_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -37543,14 +37543,14 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_float16x8_t]: __arm_vsubq_m_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_float32x4_t]: __arm_vsubq_m_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
>> mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0,
>> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int
>> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
>> mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0,
>> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38023,19 +38023,19 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vaddq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vaddq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>> @@ -38158,19 +38158,19 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vmulq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vmulq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>> @@ -38258,8 +38258,8 @@ extern void *__ARM_undef;
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
>>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
>> __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
>> __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(__p2, double), p3), \
>> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(__p2, double), p3));})
>> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
>> __ARM_mve_coerce2(p2, double), p3), \
>> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]:
>> __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t),
>> __ARM_mve_coerce2(p2, double), p3));})
>>
>>   #define __arm_vcmulq_rot90_x(p1,p2,p3)  ({ __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>> @@ -38283,16 +38283,16 @@ extern void *__ARM_undef;
>>   #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
>> __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int64x2_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
>> __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t),
>> __ARM_mve_coerce(__p1, uint64x2_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vsetq_lane_f16 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vsetq_lane_f32 (__ARM_mve_coerce2(__p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
>> __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int64x2_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
>> __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint64x2_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float16x8_t]:
>> __arm_vsetq_lane_f16 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_fp_n][__ARM_mve_type_float32x4_t]:
>> __arm_vsetq_lane_f32 (__ARM_mve_coerce2(p0, double),
>> __ARM_mve_coerce(__p1, float32x4_t), p2));})
>>
>>   #else /* MVE Integer.  */
>>
>> @@ -38410,12 +38410,12 @@ extern void *__ARM_undef;
>>   #define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38442,12 +38442,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -38461,12 +38461,12 @@ extern void *__ARM_undef;
>>   #define __arm_vrshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vrshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vrshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vrshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38497,12 +38497,12 @@ extern void *__ARM_undef;
>>   #define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38571,12 +38571,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqrshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrshlq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vqrdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38584,16 +38584,16 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqrdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqrdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqrdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>> @@ -38601,12 +38601,12 @@ extern void *__ARM_undef;
>>   #define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38637,12 +38637,12 @@ extern void *__ARM_undef;
>>   #define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38717,12 +38717,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38747,12 +38747,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> @@ -38858,12 +38858,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpeqq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpeqq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpeqq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vqmovntq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38944,16 +38944,16 @@ extern void *__ARM_undef;
>>   #define __arm_vqdmulltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmulltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmulltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmulltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>>
>>   #define __arm_vqdmullbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vqdmullbq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)));})
>>
>> @@ -38963,9 +38963,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgeq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgeq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgeq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38973,9 +38973,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgtq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgtq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgtq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmpleq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38983,9 +38983,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpleq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpleq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpleq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmpltq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -38993,20 +38993,20 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpltq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpltq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpltq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t)), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t)), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t)));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpneq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpneq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpneq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> @@ -39031,12 +39031,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vbicq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -39146,25 +39146,25 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqdmlashq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqrdmlahq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqrdmladhxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39227,9 +39227,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgeq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgeq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgeq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgeq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>
>>   #define __arm_vcmpgtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>> @@ -39238,9 +39238,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpgtq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpgtq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpgtq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpgtq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vcmpleq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39248,9 +39248,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpleq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpleq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpleq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpleq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vcmpltq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39258,9 +39258,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vcmpltq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vcmpltq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vcmpltq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpltq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vcmpneq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39271,12 +39271,12 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpneq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpneq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpneq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8_t), p2), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16_t), p2), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2));})
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpneq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vdupq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39299,23 +39299,23 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8_t)),
>> \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int)));})
>>
>>   #define __arm_vmlasq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t)), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int)));})
>>
>>   #define __arm_vnegq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39340,9 +39340,9 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t)));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int)), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int)), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int)));})
>>
>>   #define __arm_vqdmlsdhq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39505,12 +39505,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> @@ -39610,12 +39610,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>>
>>   #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -39660,12 +39660,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vmulq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vmulq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vmulq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> @@ -40002,15 +40002,15 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3));})
>>
>>   #define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>> @@ -40104,15 +40104,15 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3));})
>>
>>   #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>> @@ -40234,14 +40234,14 @@ extern void *__ARM_undef;
>>   #define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
>> __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int64x2_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
>> __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t),
>> __ARM_mve_coerce(__p1, uint64x2_t), p2));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vsetq_lane_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vsetq_lane_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vsetq_lane_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int64x2_t]:
>> __arm_vsetq_lane_s64 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int64x2_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vsetq_lane_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vsetq_lane_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vsetq_lane_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint64x2_t]:
>> __arm_vsetq_lane_u64 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint64x2_t), p2));})
>>
>>   #endif /* MVE Integer.  */
>>
>> @@ -40421,12 +40421,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> @@ -40451,12 +40451,12 @@ extern void *__ARM_undef;
>>   #define __arm_vhsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
>> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8_t), p3), \
>> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16_t), p3), \
>> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32_t), p3), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
>> __arm_vhsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
>> __ARM_mve_coerce(__p2, int8x16_t), p3), \
>>     int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
>> __arm_vhsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
>> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>>     int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
>> __arm_vhsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
>> __ARM_mve_coerce(__p2, int32x4_t), p3), \
>> @@ -40576,25 +40576,25 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqrdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqdmlashq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlashq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqrshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -40695,12 +40695,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vqsubq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqsubq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqsubq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> @@ -40715,9 +40715,9 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vqrdmulhq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqrdmulhq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqrdmulhq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqrdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqrdmlsdhxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -40843,17 +40843,17 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmlaldavaq_p_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmlaldavaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>>
>>   #define __arm_vmlaldavaxq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce(__p0,
>> int64_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce(__p0,
>> int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaxq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaxq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3));})
>>
>>   #define __arm_vmlsldavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -40992,12 +40992,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vhaddq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vhaddq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vhaddq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> @@ -41031,12 +41031,12 @@ extern void *__ARM_undef;
>>     int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_uint8x16_t]: __arm_vhsubq_m_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_uint16x8_t]: __arm_vhsubq_m_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_uint32x4_t]: __arm_vhsubq_m_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3));})
>>
>>   #define __arm_vmaxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41064,23 +41064,23 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlaq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlaq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3));})
>>
>>   #define __arm_vmlasq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vmlasq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3));})
>>
>>   #define __arm_vmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41126,12 +41126,12 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce(__p2, uint8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce(__p2, uint16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce(__p2, uint32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0,
>> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0,
>> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>> +  int
>> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
>> ve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0,
>> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
>> __ARM_mve_coerce3(p2, int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vqaddq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqaddq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqaddq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> @@ -41143,17 +41143,17 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>>     int
>> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
>> _type_int8x16_t]: __arm_vqdmulhq_m_s8 (__ARM_mve_coerce(__p0,
>> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqdmulhq_m_s16 (__ARM_mve_coerce(__p0,
>> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqdmulhq_m_s32 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3));})
>> @@ -41164,15 +41164,15 @@ extern void *__ARM_undef;
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqdmullbq_m_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqdmullbq_m_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmullbq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmullbq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3));})
>>
>>   #define __arm_vqdmulltq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulltq_m_n_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>> +  int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int_n]: __arm_vqdmulltq_m_n_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
>> int), p3), \
>>     int
>> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int16x8_t][__ARM_mve
>> _type_int16x8_t]: __arm_vqdmulltq_m_s16 (__ARM_mve_coerce(__p0,
>> int32x4_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>>     int
>> (*)[__ARM_mve_type_int64x2_t][__ARM_mve_type_int32x4_t][__ARM_mve
>> _type_int32x4_t]: __arm_vqdmulltq_m_s32 (__ARM_mve_coerce(__p0,
>> int64x2_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3));})
>>
>> @@ -41238,9 +41238,9 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce(__p0,
>> int32_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce(__p0,
>> int32_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaxq_p_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaxq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaxq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3));})
>>
>>   #define __arm_vmullbq_poly_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41311,51 +41311,51 @@ extern void *__ARM_undef;
>>   #define __arm_viwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_viwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_viwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_viwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_viwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
>>
>>   #define __arm_viwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16
>> (__ARM_mve_coerce(__p0, uint32_t), p1, (const int) p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u16
>> (__ARM_mve_coerce3(p0, int), p1, (const int) p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u16
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, (const int) p2));})
>>
>>   #define __arm_viwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32
>> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u32
>> (__ARM_mve_coerce3(p0, int), p1, p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u32
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
>>
>>   #define __arm_viwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8
>> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_viwdupq_n_u8
>> (__ARM_mve_coerce3(p0, int), p1, p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_viwdupq_wb_u8
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
>>
>>   #define __arm_vdwdupq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vdwdupq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_vdwdupq_m_wb_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_vdwdupq_m_wb_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t_ptr]:
>> __arm_vdwdupq_m_wb_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t *), p2, p3, p4));})
>>
>>   #define __arm_vdwdupq_u16(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16
>> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u16
>> (__ARM_mve_coerce3(p0, int), p1, p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u16
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
>>
>>   #define __arm_vdwdupq_u32(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32
>> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u32
>> (__ARM_mve_coerce3(p0, int), p1, p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u32
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
>>
>>   #define __arm_vdwdupq_u8(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> -  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8
>> (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
>> +  int (*)[__ARM_mve_type_int_n]: __arm_vdwdupq_n_u8
>> (__ARM_mve_coerce3(p0, int), p1, p2), \
>>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u8
>> (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
>>
>>   #define __arm_vshlcq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>> @@ -41392,14 +41392,14 @@ extern void *__ARM_undef;
>>   #define __arm_vaddlvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddlvaq_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddlvaq_u32 (__ARM_mve_coerce(__p0, uint64_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddlvaq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddlvaq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t)));})
>>
>>   #define __arm_vaddlvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddlvaq_p_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddlvaq_p_u32 (__ARM_mve_coerce(__p0, uint64_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddlvaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddlvaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
>>
>>   #define __arm_vaddlvq(p0) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -41414,22 +41414,22 @@ extern void *__ARM_undef;
>>   #define __arm_vaddvaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vaddvaq_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vaddvaq_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddvaq_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddvaq_u8 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddvaq_u16 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddvaq_u32 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vaddvaq_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vaddvaq_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddvaq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddvaq_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddvaq_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddvaq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t)));})
>>
>>   #define __arm_vaddvaq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vaddvaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vaddvaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddvaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddvaq_p_u8 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddvaq_p_u16 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> -  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddvaq_p_u32 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t]:
>> __arm_vaddvaq_p_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t]:
>> __arm_vaddvaq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t]:
>> __arm_vaddvaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t]:
>> __arm_vaddvaq_p_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t]:
>> __arm_vaddvaq_p_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>> +  int (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t]:
>> __arm_vaddvaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
>>
>>   #define __arm_vaddvq(p0) ({ __typeof(p0) __p0 = (p0); \
>>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>> @@ -41455,9 +41455,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpcsq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpcsq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpcsq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmpcsq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41465,9 +41465,9 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmpcsq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmpcsq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmpcsq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmpcsq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2));})
>>
>>   #define __arm_vcmphiq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41475,16 +41475,16 @@ extern void *__ARM_undef;
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmphiq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t)), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmphiq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t)), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmphiq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t)), \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t)), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t)), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t)));})
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int)), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int)));})
>>
>>   #define __arm_vcmphiq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
>> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8_t), p2), \
>> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16_t), p2), \
>> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32_t), p2), \
>> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
>> __arm_vcmphiq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce3(p1, int), p2), \
>>     int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
>> __arm_vcmphiq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), p2), \
>>     int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
>> __arm_vcmphiq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
>> __ARM_mve_coerce(__p1, uint16x8_t), p2), \
>>     int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
>> __arm_vcmphiq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
>> __ARM_mve_coerce(__p1, uint32x4_t), p2));})
>> @@ -41581,34 +41581,34 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce(__p0, uint32_t),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>>
>>   #define __arm_vmladavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaq_p_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaq_p_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaq_p_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaq_p_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t), p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>>
>>   #define __arm_vmladavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce(__p0, int32_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce(__p0,
>> uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int8x16_t][__ARM_mve_typ
>> e_int8x16_t]: __arm_vmladavaxq_s8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
>> int8x16_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmladavaxq_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmladavaxq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint8x16_t][__ARM_mve_ty
>> pe_uint8x16_t]: __arm_vmladavaxq_u8 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2,
>> uint8x16_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmladavaxq_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmladavaxq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>>
>>   #define __arm_vmladavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41651,17 +41651,17 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaq_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint16x8_t][__ARM_mve_ty
>> pe_uint16x8_t]: __arm_vmlaldavaq_u16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2,
>> uint16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vmlaldavaq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>>
>>   #define __arm_vmlaldavaxq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int16x8_t][__ARM_mve_typ
>> e_int16x8_t]: __arm_vmlaldavaxq_s16 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
>> int16x8_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vmlaldavaxq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)));})
>>
>>   #define __arm_vmlaldavq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>> @@ -41856,15 +41856,15 @@ extern void *__ARM_undef;
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce(__p0, int64_t),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t)), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t)));})
>>
>>   #define __arm_vrmlaldavhaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>>     __typeof(p1) __p1 = (p1); \
>>     __typeof(p2) __p2 = (p2); \
>>     _Generic( (int
>> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
>> eid(__p2)])0, \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce(__p0,
>> int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
>> int32x4_t), p3), \
>> -  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce(__p0,
>> uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_int32x4_t][__ARM_mve_typ
>> e_int32x4_t]: __arm_vrmlaldavhaq_p_s32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t),
>> p3), \
>> +  int
>> (*)[__ARM_mve_type_int_n][__ARM_mve_type_uint32x4_t][__ARM_mve_ty
>> pe_uint32x4_t]: __arm_vrmlaldavhaq_p_u32 (__ARM_mve_coerce3(p0, int),
>> __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2,
>> uint32x4_t), p3));})
>>
>>   #define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0);
>> \
>>     __typeof(p1) __p1 = (p1); \
>> --
>> 2.25.1

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
  2022-11-20 22:49     ` Ramana Radhakrishnan
@ 2022-11-21 14:11       ` Stam Markianos-Wright
  0 siblings, 0 replies; 82+ messages in thread
From: Stam Markianos-Wright @ 2022-11-21 14:11 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Kyrylo Tkachov
  Cc: Andrea Corallo, gcc-patches, Richard Earnshaw


On 11/20/22 22:49, Ramana Radhakrishnan wrote:
> On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Andrea Corallo <andrea.corallo@arm.com>
>>> Sent: Thursday, November 17, 2022 4:38 PM
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>>> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
>>> Wright@arm.com>
>>> Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
>>> overloading [PR107515]
>>>
>>> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>
>>> This patch adds explicit references to other float types
>>> to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
>>>
>>> gcc/ChangeLog:
>>>          PR 107515
>>>          * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
>> Argh, I'm looking forward to when we move away from this _Generic business, but for now ok.
>> The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC.
> and the PR is against 11.x - is there a plan to back port this and
> dependent patches to relevant branches ?

Hi Ramana!


Assuming maintainer approval, we do hope to backport.

And yes, it would have to be the whole patch series, so that we carry

over all the improved testing, as well (and we'll have to run it ofc).


Does that sound Ok?

Thank you,

Stam


>
> Ramana
>
>> Thanks,
>> Kyrill
>>
>>> ---
>>>   gcc/config/arm/arm_mve.h | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
>>> index fd1876b57a0..f6b42dc3fab 100644
>>> --- a/gcc/config/arm/arm_mve.h
>>> +++ b/gcc/config/arm/arm_mve.h
>>> @@ -35582,6 +35582,9 @@ enum {
>>>        short: __ARM_mve_type_int_n, \
>>>        int: __ARM_mve_type_int_n, \
>>>        long: __ARM_mve_type_int_n, \
>>> +     _Float16: __ARM_mve_type_fp_n, \
>>> +     __fp16: __ARM_mve_type_fp_n, \
>>> +     float: __ARM_mve_type_fp_n, \
>>>        double: __ARM_mve_type_fp_n, \
>>>        long long: __ARM_mve_type_int_n, \
>>>        unsigned char: __ARM_mve_type_int_n, \
>>> --
>>> 2.25.1

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 10/35] arm: improve tests for vabavq*
  2022-11-18 16:43   ` Kyrylo Tkachov
@ 2022-11-21 14:49     ` Andrea Corallo
  0 siblings, 0 replies; 82+ messages in thread
From: Andrea Corallo @ 2022-11-21 14:49 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: gcc-patches, Richard Earnshaw

Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:

>> -----Original Message-----
>> From: Andrea Corallo <andrea.corallo@arm.com>
>> Sent: Thursday, November 17, 2022 4:38 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
>> Subject: [PATCH 10/35] arm: improve tests for vabavq*
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_s16.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_s32.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_s8.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_u16.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_u32.c:
>> 	* gcc.target/arm/mve/intrinsics/vabavq_u8.c:
>
> Missing ChangeLog text?
> Ok with ChangeLog fixed.

Ops! sorry

Thanks

  Andrea
  

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic
  2022-11-17 16:37 ` [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic Andrea Corallo
@ 2022-11-22 10:00   ` Christophe Lyon
  2022-11-22 10:54     ` Andrea Corallo
  2022-11-22 16:48   ` Kyrylo Tkachov
  1 sibling, 1 reply; 82+ messages in thread
From: Christophe Lyon @ 2022-11-22 10:00 UTC (permalink / raw)
  To: gcc-patches, Stam Markianos-Wright
  Cc: kyrylo.tkachov, Richard.Earnshaw, Andrea Corallo



On 11/17/22 17:37, Andrea Corallo via Gcc-patches wrote:
> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> 
> In the past we had only defined the vsubq_x generic overload of the
> vsubq_x_* intrinsics for float vector types.  This would cause them
> to fall back to the `__ARM_undef` failure state if they was called
> through the generic version.
> This patch simply adds these overloads.
> 
> gcc/ChangeLog:
> 
>          * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>           (__arm_vsubq_x Integer): New.

Hi Stam,

To hopefully help Kyrill in the review, I think this fix is tested by 
patch #19, where we now have
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
(this line explains why this bug was not noticed so far)

Thanks,

Christophe

> ---
>   gcc/config/arm/arm_mve.h | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index f6b42dc3fab..09167ec118e 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
>   #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>     __typeof(p2) __p2 = (p2); \
>     _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
>     int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
>     int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
> @@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
>     int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce1(p0, uint16_t *)), \
>     int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce1(p0, uint32_t *))))
>   
> +#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> +  __typeof(p2) __p2 = (p2); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
> +
>   #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>     _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>     int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic
  2022-11-22 10:00   ` Christophe Lyon
@ 2022-11-22 10:54     ` Andrea Corallo
  0 siblings, 0 replies; 82+ messages in thread
From: Andrea Corallo @ 2022-11-22 10:54 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: gcc-patches, Stam Markianos-Wright, kyrylo.tkachov, Richard.Earnshaw

Christophe Lyon <christophe.lyon@arm.com> writes:

> On 11/17/22 17:37, Andrea Corallo via Gcc-patches wrote:
>> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
>> In the past we had only defined the vsubq_x generic overload of the
>> vsubq_x_* intrinsics for float vector types.  This would cause them
>> to fall back to the `__ARM_undef` failure state if they was called
>> through the generic version.
>> This patch simply adds these overloads.
>> gcc/ChangeLog:
>>          * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>>           (__arm_vsubq_x Integer): New.
>
> Hi Stam,
>
> To hopefully help Kyrill in the review, I think this fix is tested by
> patch #19, where we now have
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> (this line explains why this bug was not noticed so far)
>
> Thanks,
>
> Christophe

Exactly

PS also the fact that now tests are 'check-function-bodies' should catch
that.

Thanks

  Andrea

^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic
  2022-11-17 16:37 ` [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic Andrea Corallo
  2022-11-22 10:00   ` Christophe Lyon
@ 2022-11-22 16:48   ` Kyrylo Tkachov
  1 sibling, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:48 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Stam Markianos-Wright



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Stam Markianos-Wright <Stam.Markianos-
> Wright@arm.com>
> Subject: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x
> instrinsic
> 
> From: Stam Markianos-Wright <stam.markianos-wright@arm.com>
> 
> In the past we had only defined the vsubq_x generic overload of the
> vsubq_x_* intrinsics for float vector types.  This would cause them
> to fall back to the `__ARM_undef` failure state if they was called
> through the generic version.
> This patch simply adds these overloads.

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
>         * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>          (__arm_vsubq_x Integer): New.
> ---
>  gcc/config/arm/arm_mve.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index f6b42dc3fab..09167ec118e 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
>  #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> @@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16
> (__ARM_mve_coerce1(p0, uint16_t *)), \
>    int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32
> (__ARM_mve_coerce1(p0, uint32_t *))))
> 
> +#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> +  __typeof(p2) __p2 = (p2); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3));})
> +
>  #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8
> (__ARM_mve_coerce(__p0, int8x16_t), p1), \
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 17/35] arm: improve tests and fix vadd*
  2022-11-17 16:37 ` [PATCH 17/35] arm: improve tests and fix vadd* Andrea Corallo
@ 2022-11-22 16:49   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:49 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 17/35] arm: improve tests and fix vadd*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vaddlvq_p_<supf>v4si)
> 	(mve_vaddq_n_<supf><mode>, mve_vaddvaq_<supf><mode>)
> 	(mve_vaddlvaq_<supf>v4si, mve_vaddq_n_f<mode>)
> 	(mve_vaddlvaq_p_<supf>v4si, mve_vaddq<mode>,
> mve_vaddq_f<mode>):
> 	Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vaddvq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         | 18 ++++----
>  .../arm/mve/intrinsics/vaddlvaq_p_s32.c       | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddlvaq_p_u32.c       | 40 +++++++++++++++++-
>  .../arm/mve/intrinsics/vaddlvaq_s32.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vaddlvaq_u32.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddlvq_p_s32.c        | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddlvq_p_u32.c        | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddlvq_s32.c          | 22 +++++++---
>  .../arm/mve/intrinsics/vaddlvq_u32.c          | 20 +++++++--
>  .../gcc.target/arm/mve/intrinsics/vaddq_f16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_f32.c | 16 ++++++-
>  .../arm/mve/intrinsics/vaddq_m_f16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_f32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_f16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_f32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_s16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_s32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_s8.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_u16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_u32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_n_u8.c         | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_m_u8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_n_f16.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddq_n_f32.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddq_n_s16.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vaddq_n_s32.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vaddq_n_s8.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vaddq_n_u16.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddq_n_u32.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddq_n_u8.c           | 28 ++++++++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_s16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_s32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_s8.c  | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_u16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_u32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vaddq_u8.c  | 16 ++++++-
>  .../arm/mve/intrinsics/vaddq_x_f16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_f32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_f16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_f32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_s16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_s32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_s8.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_u16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_u32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_n_u8.c         | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddq_x_u8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vaddvaq_p_s16.c        | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_p_s32.c        | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_p_s8.c         | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_p_u16.c        | 40 +++++++++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_p_u32.c        | 40 +++++++++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_p_u8.c         | 40 +++++++++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_s16.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vaddvaq_s32.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vaddvaq_s8.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vaddvaq_u16.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_u32.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddvaq_u8.c           | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_s16.c         | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_s32.c         | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_s8.c          | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_u16.c         | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_u32.c         | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_p_u8.c          | 24 ++++++++++-
>  .../arm/mve/intrinsics/vaddvq_s16.c           | 22 +++++++---
>  .../arm/mve/intrinsics/vaddvq_s32.c           | 22 +++++++---
>  .../gcc.target/arm/mve/intrinsics/vaddvq_s8.c | 20 +++++++--
>  .../arm/mve/intrinsics/vaddvq_u16.c           | 20 +++++++--
>  .../arm/mve/intrinsics/vaddvq_u32.c           | 20 +++++++--
>  .../gcc.target/arm/mve/intrinsics/vaddvq_u8.c | 20 +++++++--
>  81 files changed, 1864 insertions(+), 252 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index bc4e2f2ac21..5ce2a289225 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -636,7 +636,7 @@ (define_insn "mve_vaddlvq_<supf>v4si"
>  	 VADDLVQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vaddlv.<supf>32 %Q0, %R0, %q1"
> +  "vaddlv.<supf>32\t%Q0, %R0, %q1"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -817,7 +817,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si"
>  	 VADDLVQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vaddlvt.<supf>32 %Q0, %R0, %q1"
> +  "vpst\;vaddlvt.<supf>32\t%Q0, %R0, %q1"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -879,7 +879,7 @@ (define_insn "mve_vaddq_n_<supf><mode>"
>  	 VADDQ_N))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vadd.i%#<V_sz_elem>	%q0, %q1, %2"
> +  "vadd.i%#<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -894,7 +894,7 @@ (define_insn "mve_vaddvaq_<supf><mode>"
>  	 VADDVAQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vaddva.<supf>%#<V_sz_elem>	%0, %q2"
> +  "vaddva.<supf>%#<V_sz_elem>\t%0, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1834,7 +1834,7 @@ (define_insn "mve_vaddlvaq_<supf>v4si"
>  	 VADDLVAQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vaddlva.<supf>32 %Q0, %R0, %q2"
> +  "vaddlva.<supf>32\t%Q0, %R0, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1849,7 +1849,7 @@ (define_insn "mve_vaddq_n_f<mode>"
>  	 VADDQ_N_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vadd.f%#<V_sz_elem>	%q0, %q1, %2"
> +  "vadd.f%#<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -3717,7 +3717,7 @@ (define_insn "mve_vaddlvaq_p_<supf>v4si"
>  	 VADDLVAQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vaddlvat.<supf>32 %Q0, %R0, %q2"
> +  "vpst\;vaddlvat.<supf>32\t%Q0, %R0, %q2"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
>  ;;
> @@ -8928,7 +8928,7 @@ (define_insn "mve_vaddq<mode>"
>  		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vadd.i%#<V_sz_elem>  %q0, %q1, %q2"
> +  "vadd.i%#<V_sz_elem>\t%q0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -8942,7 +8942,7 @@ (define_insn "mve_vaddq_f<mode>"
>  		    (match_operand:MVE_0 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vadd.f%#<V_sz_elem> %q0, %q1, %q2"
> +  "vadd.f%#<V_sz_elem>\t%q0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
> index 0991ac1b355..3a9504df94e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo (int64_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddlvaq_p_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo1 (int64_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddlvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
> index 5af786e8e76..6e2613ee099 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo (uint64_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddlvaq_p_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo1 (uint64_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddlvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvat.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint64_t
> +foo2 (uint32x4_t b, mve_pred16_t p)
> +{
> +  return vaddlvaq_p (1, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
> index 78f155f1586..180dc9b2deb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddlva.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo (int64_t a, int32x4_t b)
>  {
>    return vaddlvaq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlva.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddlva.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo1 (int64_t a, int32x4_t b)
>  {
>    return vaddlvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlva.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
> index a7dfa2541ab..1f899e92c3c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo (uint64_t a, uint32x4_t b)
>  {
>    return vaddlvaq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlva.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo1 (uint64_t a, uint32x4_t b)
>  {
>    return vaddlvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlva.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vaddlva.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint64_t
> +foo2 (uint32x4_t b)
> +{
> +  return vaddlvaq (1, b);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
> index 8aa18323b53..5b22da49c1d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo (int32x4_t a, mve_pred16_t p)
>  {
>    return vaddlvq_p_s32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo1 (int32x4_t a, mve_pred16_t p)
>  {
>    return vaddlvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
> index a9cee74e2ee..2c85139435a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvt.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo (uint32x4_t a, mve_pred16_t p)
>  {
>    return vaddlvq_p_u32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddlvt.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo1 (uint32x4_t a, mve_pred16_t p)
>  {
>    return vaddlvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
> index 4bd70aacc05..bdb04b5214f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_s32.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddlv.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo (int32x4_t a)
>  {
>    return vaddlvq_s32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlv.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddlv.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo1 (int32x4_t a)
>  {
> -  return vaddlvq_s32 (a);
> +  return vaddlvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlv.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
> index 2148bd9a32e..bcd9d21df4f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddlv.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo (uint32x4_t a)
>  {
> -    return vaddlvq_u32 (a);
> +  return vaddlvq_u32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlv.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddlv.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo1 (uint32x4_t a)
>  {
> -    return vaddlvq (a);
> +  return vaddlvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddlv.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
> index 3d1100a9e81..58462177473 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vaddq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
> index e15e0d13e4f..f3fcd286f4d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vaddq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
> index 51d7020bd1f..291e65f32cc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
> index 7821bc241ff..0346f65a330 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
> index 796bed47613..9d57bbd27b9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
> +{
> +  return vaddq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
> index afa3c4c722e..9939aa0012d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
> +{
> +  return vaddq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
> index 0ef433724ba..50b138fc763 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
> index 46ac88e940d..66c2be777ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
> index 1867d5603d1..87dba75dff1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
> index 1da993b5e31..a8e9ea576b3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
> index d7404c9f4ce..045e5024d5d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
> index 013e83938b2..3d17afcbe56 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vaddq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
> index 244c88fcf89..87210a41dae 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
> index 7a59d75af11..1acb0b67fa9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
> index 5b8c74ab017..6136c54cbb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
> index f28e3d789ab..b60d98e0691 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
> index aeb836ce87d..d56bbae9b03 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
> index c698df3a146..9f0b623c3e8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
> index 024fab5c0b2..5df23a6e61f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vaddq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vadd.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a)
> +{
> +  return vaddq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
> index 06b1528460e..d07927c427e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_f32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vaddq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vadd.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a)
> +{
> +  return vaddq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
> index 63765f41deb..9ae30406f51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vaddq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
> index e462fbfab8e..3271d4d5af1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vaddq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
> index ad7181fd8f5..119fd5d5528 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vaddq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
> index dac7a9fb9ba..ef0722e4dcd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vaddq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
> index 2f1feb89d32..67513819f39 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vaddq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
> index 325bdade765..2aa79e5e916 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vaddq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
> index 31f6cb42e9f..24b12a6aee1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vaddq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
> index 96aead168cc..3fdfa3d86e6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vaddq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
> index 6676a2e269b..6b32b8ccfd5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vaddq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
> index 1b19876e09a..0deefa14ac6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vaddq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
> index 8f5acc69e79..44df963f0f8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vaddq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
> index e5be2fa1b59..7349fa165bf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vaddq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vadd.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vadd.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
> index bd2a198eb72..b1d48a1d260 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
> index 5369f4d4876..047043d6526 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
> index d2eed8cf66f..ed67007df51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vaddq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
> index 40d56da12b1..fa17d6b4aa2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vaddq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
> index e974cdf914b..d6c3252132a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
> index a6ac9ccd3af..c2a861706d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
> index f5539ef9c67..abc90a4c86b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
> index f167df122a0..8866a07bc8e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
> index 653c3eed7a0..4123ad594ed 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
> index 0ad65c8dde5..d610930a311 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vaddq_x_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
> index 75b1491e17d..323010a6d33 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
> index 1aadebda459..98773e7ba6f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
> index d6b07cee79a..bff0bda1109 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
> index 5c9abc2492a..85f5cd4db7a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
> index d55ec735460..ad0e7afbc39 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
> index bcc058b3769..a3cfc5686e2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_x_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vaddt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
> index c4bfe34aa91..16b51514be1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
> index cdc32807a24..bbf04aa0d08 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
> index d330411115a..f06623b1893 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
> index 74d9246cd63..7bfb4bb9cbe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint16x8_t b, mve_pred16_t p)
> +{
> +  return vaddvaq_p (1, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
> index e4ec42b2544..9aea5caa4fe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint32x4_t b, mve_pred16_t p)
> +{
> +  return vaddvaq_p (1, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
> index f9bed8379a4..b5113b209c0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vaddvaq_p (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvat.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvat.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint8x16_t b, mve_pred16_t p)
> +{
> +  return vaddvaq_p (1, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
> index 5f6a8cf9d89..1b9af185a0d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int16x8_t b)
>  {
>    return vaddvaq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int16x8_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
> index 29e27f59328..e25487954d2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int32x4_t b)
>  {
>    return vaddvaq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int32x4_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
> index cac43464679..d37c916c94d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32_t a, int8x16_t b)
>  {
>    return vaddvaq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32_t a, int8x16_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
> index c943fa5789f..b3583ce5725 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint16x8_t b)
>  {
>    return vaddvaq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint16x8_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vaddva.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint16x8_t b)
> +{
> +  return vaddvaq (1, b);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
> index 0950ff50d0f..006c0a3734f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint32x4_t b)
>  {
>    return vaddvaq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint32x4_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vaddva.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint32x4_t b)
> +{
> +  return vaddvaq (1, b);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
> index 2a58225fbe3..cfe29bfd7be 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvaq_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32_t a, uint8x16_t b)
>  {
>    return vaddvaq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32_t a, uint8x16_t b)
>  {
>    return vaddvaq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vaddva.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vaddva.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint8x16_t b)
> +{
> +  return vaddvaq (1, b);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
> index a786b8974b7..3d19b46fdc6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int16x8_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_s16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int16x8_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
> index c688782180f..a148d15ead1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32x4_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_s32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32x4_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
> index 8438448f86c..f0b0c499d0d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int8x16_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_s8 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int8x16_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
> index ec7a5fa5a7f..2fb316c50ab 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint16x8_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_u16 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint16x8_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
> index b70968880ce..24bde90ec77 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32x4_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_u32 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32x4_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
> index 69381b78cc4..f6710941119 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint8x16_t a, mve_pred16_t p)
>  {
>    return vaddvq_p_u8 (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vaddvt.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint8x16_t a, mve_pred16_t p)
>  {
>    return vaddvq_p (a, p);
>  }
> 
> -/* { dg-final { scan-assembler "vaddvt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
> index b4fc11f4aa4..6b9a99f2b07 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s16.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int16x8_t a)
>  {
>    return vaddvq_s16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.s16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int16x8_t a)
>  {
> -  return vaddvq_s16 (a);
> +  return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
> index 438b46ec246..50823b65ecc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s32.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int32x4_t a)
>  {
>    return vaddvq_s32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.s32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int32x4_t a)
>  {
> -  return vaddvq_s32 (a);
> +  return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
> index b60b1f2da98..131edbe2b3f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_s8.c
> @@ -1,21 +1,33 @@
> -/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo (int8x16_t a)
>  {
>    return vaddvq_s8 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.s8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
>  foo1 (int8x16_t a)
>  {
>    return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
> index de782127faf..7c0ac0e1395 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint16x8_t a)
>  {
> -    return vaddvq_u16 (a);
> +  return vaddvq_u16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.u16	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint16x8_t a)
>  {
> -    return vaddvq (a);
> +  return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
> index c4672e42288..40779ed0f99 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint32x4_t a)
>  {
> -    return vaddvq_u32 (a);
> +  return vaddvq_u32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.u32	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint32x4_t a)
>  {
> -    return vaddvq (a);
> +  return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
> index e4e149cfb61..d2a6ba8f0fb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddvq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vaddv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo (uint8x16_t a)
>  {
> -    return vaddvq_u8 (a);
> +  return vaddvq_u8 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vaddv.u8	(?:ip|fp|r[0-9]+), q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
>  foo1 (uint8x16_t a)
>  {
> -    return vaddvq (a);
> +  return vaddvq (a);
>  }
> 
> -/* { dg-final { scan-assembler "vaddv.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 18/35] arm: improve tests for vmulq*
  2022-11-17 16:37 ` [PATCH 18/35] arm: improve tests for vmulq* Andrea Corallo
@ 2022-11-22 16:51   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:51 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 18/35] arm: improve tests for vmulq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vmulq_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vmulq_f16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_f32.c | 16 ++++++-
>  .../arm/mve/intrinsics/vmulq_m_f16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_f32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_f16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_f32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_s16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_s32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_s8.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_u16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_u32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_n_u8.c         | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_m_u8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_n_f16.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vmulq_n_f32.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vmulq_n_s16.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vmulq_n_s32.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vmulq_n_s8.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vmulq_n_u16.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vmulq_n_u32.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vmulq_n_u8.c           | 28 ++++++++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_s16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_s32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_s8.c  | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_u16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_u32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_u8.c  | 16 ++++++-
>  .../arm/mve/intrinsics/vmulq_x_f16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_f32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_f16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_f32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_s16.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_s32.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_s8.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_u16.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_u32.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_n_u8.c         | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_s16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_s32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_s8.c           | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_u16.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_u32.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vmulq_x_u8.c           | 26 ++++++++++--
>  48 files changed, 1148 insertions(+), 160 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
> index 68fb012ad34..9251809bfa1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vmulq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
> index 512661aeec7..3dacb7ad77c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vmulq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
> index d05d48f6261..8f47e962633 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
> index 8c2ec81da3b..41f3786e5fe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
> index 1f1d408d5b9..2f4fecbf56b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
> +{
> +  return vmulq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
> index 4aae0849e2b..2ad4108d637 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
> +{
> +  return vmulq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
> index 9a87f7d3643..b10bd5af687 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
> index da7d38b9968..e8bdf7278ad 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
> index 227b3a50a92..001e888e075 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
> index e09334df1de..5015f20a4be 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vmulq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
> index 62d6c262e5a..a6013a42721 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vmulq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
> index e7993ab3c31..42fc7264229 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vmulq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vmulq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
> index 61cdf656c19..04fdc010f5b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
> index 622407b96da..96178d02e37 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
> index bb2943cc727..aa3b8061122 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
> index a0680174753..e56ab77f3ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
> index 586a32560d7..72e313cfd78 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
> index 0a8e49a5982..1ae6a93934c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
> index a3f693f06f7..d77aeb219ca 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vmulq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmul.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a)
> +{
> +  return vmulq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
> index 5d1cfa368a7..9ef6a21b2bd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_f32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vmulq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmul.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a)
> +{
> +  return vmulq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
> index 98e84cbf202..7ea25dce4a7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vmulq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
> index adbfd6fe10b..b884603ac5b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vmulq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
> index c845f108f88..8e6e17cd593 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vmulq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
> index e52acdc53b9..907bb0a4009 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vmulq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vmulq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
> index 9da4bc1f359..1164b29fc76 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vmulq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vmulq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
> index e0f152db729..ccc950e3ccf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vmulq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vmulq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
> index 89cc604fda0..a1fc1fc8f04 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vmulq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
> index f87fbf1249c..4fcf0dd88d1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vmulq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
> index 4e40065ad22..d0c147ef912 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vmulq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
> index ae95bf68afe..d4a24ba95b6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vmulq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
> index 4f8e9762d5f..c9194b73eaf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vmulq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
> index a3776ff8314..d69402021ec 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vmulq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmul.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vmulq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmul.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
> index 1f864cf481a..169871b47d8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
> index 07cc3d0277c..f800731b3ff 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
> index 8fa6c759d54..a4dc47725b5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vmulq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
> index 654713c1348..e8428fe9b2d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vmulq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
> index 4ec5ab397e1..27ef55d932a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
> index c52180067cf..929f420bd4c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
> index a2a7c734de8..31885a2d90f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
> index 419a3cb6ea6..5972a525092 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vmulq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
> index 5acfcf6bf61..3e02a542988 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vmulq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
> index 27e95ced0b5..9b59b189a5f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vmulq_x_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vmulq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
> index 5c232bfdc34..09b7169a68b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
> index 685fe45e4d0..a57ef2da840 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
> index 19ecc6bcafc..7fb5e007990 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
> index 0700ca818ab..7b1c6b2acc8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
> index a1cb2aa221e..bc53faff33f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
> index 3b29852c830..f43760861d4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmulq_x_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmult.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vmulq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmult.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 19/35] arm: improve tests and fix vsubq*
  2022-11-17 16:37 ` [PATCH 19/35] arm: improve tests and fix vsubq* Andrea Corallo
@ 2022-11-22 16:51   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:51 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 19/35] arm: improve tests and fix vsubq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vsubq_n_f<mode>): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vsubq_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vsubq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f32.c | 16 ++++++-
>  .../arm/mve/intrinsics/vsubq_m_f16.c          | 26 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_f32.c          | 26 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_f16.c        | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_f32.c        | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_s16.c        | 26 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_s32.c        | 26 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_s8.c         | 26 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_u16.c        | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_u32.c        | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_m_n_u8.c         | 42 ++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_m_s16.c          | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_s32.c          | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_s8.c           | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_u16.c          | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_u32.c          | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_m_u8.c           | 25 ++++++++--
>  .../arm/mve/intrinsics/vsubq_n_f16.c          | 28 ++++++++++-
>  .../arm/mve/intrinsics/vsubq_n_f32.c          | 28 ++++++++++-
>  .../arm/mve/intrinsics/vsubq_n_s16.c          | 17 +++++--
>  .../arm/mve/intrinsics/vsubq_n_s32.c          | 17 +++++--
>  .../arm/mve/intrinsics/vsubq_n_s8.c           | 17 +++++--
>  .../arm/mve/intrinsics/vsubq_n_u16.c          | 29 +++++++++--
>  .../arm/mve/intrinsics/vsubq_n_u32.c          | 29 +++++++++--
>  .../arm/mve/intrinsics/vsubq_n_u8.c           | 29 +++++++++--
>  .../gcc.target/arm/mve/intrinsics/vsubq_s16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_s32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_s8.c  | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_u16.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_u32.c | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_u8.c  | 16 ++++++-
>  .../arm/mve/intrinsics/vsubq_x_f16.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_f32.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_f16.c        | 48 +++++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_f32.c        | 48 +++++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_s16.c        | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_s32.c        | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_s8.c         | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_u16.c        | 48 +++++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_u32.c        | 48 +++++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_n_u8.c         | 48 +++++++++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_s16.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_s32.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_s8.c           | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_u16.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_u32.c          | 32 +++++++++++--
>  .../arm/mve/intrinsics/vsubq_x_u8.c           | 32 +++++++++++--
>  49 files changed, 1261 insertions(+), 145 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 5ce2a289225..714dc6fc7ce 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -679,7 +679,7 @@ (define_insn "mve_vsubq_n_f<mode>"
>  	 VSUBQ_N_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vsub.f<V_sz_elem>  %q0, %q1, %2"
> +  "vsub.f<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
> index 8e3ce24fa49..3d82b081ca2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>    return vsubq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
> index 5cb239d70fa..d0f64bb9872 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_f32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b)
>  {
>    return vsubq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32x4_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
> index f4b3f806822..434b0a7ced8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
> index 75dbf9335c9..0b8e056647e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_f32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
> index 556a0845087..abbd60060a7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_f16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t inactive, float16x8_t a, float16_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t inactive, float16x8_t a, mve_pred16_t p)
> +{
> +  return vsubq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
> index e53f5f1966a..40ca4284a1f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_f32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t inactive, float32x4_t a, float32_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t inactive, float32x4_t a, mve_pred16_t p)
> +{
> +  return vsubq_m (inactive, a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
> index 73443d500ba..f13eff8ad2d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
> index b4031111678..21ba17ba869 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
> index 5c4e1019225..c75b8b5420d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
> index 04a3036ede8..700bc01833c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
> index a21f9366373..25dd37ae5b2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
> index 18f635f1e1a..4fed154d258 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vsubq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
> index 598d648887b..dde77dc51b7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
> index af6750278f1..8770e31ad95 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
> index 5effbe2e017..c9813313594 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vsubq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
> index 12218ae6791..eebc3ad6929 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
> index 3a63eeb2b3d..d85bbec7ebf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
> index a17a2741a47..a104a74e259 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vsubq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> index 10e27dae907..4db52649ab4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b)
>  {
>    return vsubq_n_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vsub.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a)
> +{
> +  return vsubq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> index 9e16d6c075c..fe97eed7d37 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b)
>  {
>    return vsubq_n_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo1 (float32x4_t a, float32_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vsub.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a)
> +{
> +  return vsubq (a, 1.1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
> index 7f2af8691c0..d695fc83e06 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s16.c
> @@ -1,22 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vsubq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
> index a5e6bf486fd..c281e21ab0c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s32.c
> @@ -1,22 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vsubq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
> index 5754379358d..ef36b4d6330 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_s8.c
> @@ -1,22 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vsubq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
> index ea0a3f9260c..be754d894a8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u16.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vsubq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
> index cc409b59438..ef0aaa4cf08 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u32.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vsubq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
> index 8a18a89b353..c55aefc3307 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_u8.c
> @@ -1,22 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vsubq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
> index 15e732f1f66..469395452bd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vsubq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
> index 5b4ee855711..0e60e1c6f60 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vsubq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
> index b23893af605..882d63dfcf7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vsubq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
> index edb5e354411..fe9baf3d52c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vsubq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
> index 68040afd52b..b82051d69d5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vsubq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
> index 92c4f059b0e..630b2f79f1f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vsubq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vsub.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vsub.i8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
> index 4cb8be0ea7f..c48bea7e9f0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f16.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_f16 (a, b, p);
> +  return vsubq_x_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
> index f6711d7f207..d3e129bb6ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_f32.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_f32 (a, b, p);
> +  return vsubq_x_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
> index c4adacbf5be..2dcaff58c09 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c
> @@ -1,15 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_f16 (a, b, p);
> +  return vsubq_x_n_f16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t a, mve_pred16_t p)
> +{
> +  return vsubq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
> index a4affa0a3a9..92bafa3c4cc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c
> @@ -1,15 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32x4_t a, float32_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_f32 (a, b, p);
> +  return vsubq_x_n_f32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t a, mve_pred16_t p)
> +{
> +  return vsubq_x (a, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
> index 99c59b1a6c1..f01e8d7d490 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_s16 (a, b, p);
> +  return vsubq_x_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int16x8_t
> +foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
> index 6c29ebec05c..506966424cc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_s32 (a, b, p);
> +  return vsubq_x_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int32x4_t
> +foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
> index 0f83c305473..3c4a5d8129c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_s8 (a, b, p);
> +  return vsubq_x_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int8x16_t
> +foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
> index 9a372d762d1..958e5aa2ce8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c
> @@ -1,15 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_u16 (a, b, p);
> +  return vsubq_x_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
> index 5219f154fa9..ba39c75bb2b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c
> @@ -1,15 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_u32 (a, b, p);
> +  return vsubq_x_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
> index 0a0bcf8623a..19204d1d80f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c
> @@ -1,15 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_n_u8 (a, b, p);
> +  return vsubq_x_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
> index 37936a6d647..8dcc5477c6f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s16.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_s16 (a, b, p);
> +  return vsubq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +int16x8_t
> +foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
> index c085f59c6a2..a2d43323227 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s32.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_s32 (a, b, p);
> +  return vsubq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +int32x4_t
> +foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
> index 361507821ea..8ead3d22439 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_s8.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_s8 (a, b, p);
> +  return vsubq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +int8x16_t
> +foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
> index 21423dc4f80..f0faf8165d2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u16.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_u16 (a, b, p);
> +  return vsubq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
> index 38dd09ad8f7..67a70931859 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u32.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_u32 (a, b, p);
> +  return vsubq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
> index 406cbf760fd..19002336cbd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_x_u8.c
> @@ -1,15 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
> -    return vsubq_x_u8 (a, b, p);
> +  return vsubq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vsubt.i8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vsubt.i8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
> +{
> +  return vsubq_x (a, b, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 20/35] arm: improve tests for vfmasq_m*
  2022-11-17 16:37 ` [PATCH 20/35] arm: improve tests for vfmasq_m* Andrea Corallo
@ 2022-11-22 16:52   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:52 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 20/35] arm: improve tests for vfmasq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vfmasq_m_n_f16.c       | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vfmasq_m_n_f32.c       | 50 ++++++++++++++++---
>  2 files changed, 84 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> index 06d2d114e46..03b376c9bbe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
> -foo (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f16 (a, b, c, p);
> +  return vfmasq_m_n_f16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
> -foo1 (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo1 (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t m1, float16x8_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> index bf1773d0eeb..ecf30ba9826 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
> -foo (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f32 (a, b, c, p);
> +  return vfmasq_m_n_f32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
> -foo1 (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo1 (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vfmast.f32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t m1, float32x4_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 21/35] arm: improve tests for vhaddq_m*
  2022-11-17 16:37 ` [PATCH 21/35] arm: improve tests for vhaddq_m* Andrea Corallo
@ 2022-11-22 16:53   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:53 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 21/35] arm: improve tests for vhaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhaddq_m_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_s16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_s32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_s8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_u16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_u32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_m_u8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_n_s16.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_n_s32.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_n_s8.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_n_u16.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhaddq_n_u32.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhaddq_n_u8.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhaddq_s16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_s32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_u16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_u32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vhaddq_x_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_s16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_s32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_s8.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_u16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_u32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhaddq_x_u8.c          | 25 +++++++++--
>  36 files changed, 828 insertions(+), 114 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
> index e90af963697..0bd03832ff5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
> index fcce85fd1bd..42fe35dc746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
> index 56558b7033a..1f4a4016c74 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
> index d7ee0febab9..7d7ebebd638 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vhaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
> index 1117b9813ce..31f7ee2fa54 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vhaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
> index 90c66595d3f..2120472af46 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vhaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
> index e8b87283a73..4b4ce40efb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
> index ddcfd11198e..e532055c675 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
> index ef5fcd02cc5..25b81629ec3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
> index d7b9aaab62c..4a9e9f3f438 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
> index c8d7f6c4cf3..1e68099ebf2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
> index 9792941b091..6dd75d7336e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
> index d0d77f5a7fd..20a999da1d2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vhaddq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
> index a8b4f3415a1..986cb8d3ba5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vhaddq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
> index 2459ba0a7ab..57a4b36f5fe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vhaddq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
> index cd681e7a5f9..abed33b0e37 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vhaddq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vhaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
> index d2cb7f6284e..5e5204fb3a7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vhaddq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vhaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
> index 509e1746259..b35221ef81b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vhaddq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vhaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
> index 47afc591cdb..310964f3440 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vhaddq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
> index fdc6476d0ee..d8222645c21 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vhaddq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
> index 3321765e909..85b2feee346 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vhaddq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
> index ad46355feab..2da0aa053e5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vhaddq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
> index 7477585fe55..49b865a123b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vhaddq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
> index 9edf8e5eb90..5ecd3cbf6ec 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vhaddq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vhaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhadd.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
> index 5a9302129c7..a4e277d4e1f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
> index 0a4ef00afa1..c79b88d6ced 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
> index ae6c27a8878..61893536231 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
> index ddc99a82f79..146d226f36f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vhaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
> index dce9bc212e2..b70014fb6a5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vhaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
> index 262c5937a91..03978dfa28a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vhaddq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
> index 65df0093401..c3c787583dd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
> index 7ff76e7170a..a1ab196d3d2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
> index 23f545c45cd..061ae89315e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
> index 97674c1f73c..0ee88520f8f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
> index b6404ce9d17..0a0e512c5fc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
> index 7c2d74a2662..c495641c532 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhaddq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 22/35] arm: improve tests for vhsubq_m*
  2022-11-17 16:37 ` [PATCH 22/35] arm: improve tests for vhsubq_m* Andrea Corallo
@ 2022-11-22 16:53   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:53 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 22/35] arm: improve tests for vhsubq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhsubq_m_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_s16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_s32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_s8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_u16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_u32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_m_u8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_n_s16.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_n_s32.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_n_s8.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_n_u16.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhsubq_n_u32.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhsubq_n_u8.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vhsubq_s16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_s32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_u16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_u32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vhsubq_x_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_s16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_s32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_s8.c          | 25 +++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_u16.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_u32.c         | 25 +++++++++--
>  .../arm/mve/intrinsics/vhsubq_x_u8.c          | 25 +++++++++--
>  36 files changed, 828 insertions(+), 114 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
> index 27dcb7be957..6390589808f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
> index 75ae735f30d..db09d0f2c21 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
> index 84cdeb42952..89ea3f2aaf8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
> index bc6610c3812..e6fb8be673b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vhsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
> index e94bfc95027..7ab815d5623 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vhsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
> index c2a5674afd1..0bf695aded4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vhsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
> index 9f62a385554..3bad177ad28 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
> index 486ae6b7d58..cc5cdb07059 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
> index 9faaa4fbb0d..4c651091e59 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
> index aa5838cdad2..daed202c055 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
> index 00282ad6444..cf71e6dab13 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
> index 187d5bcf8a1..a8183dd48ed 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
> index ce766486aed..af4f534d7ff 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vhsubq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
> index 1d820ffaf5a..941d38074a4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vhsubq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
> index 90110b78f0d..9ceb4ef3c6f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vhsubq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
> index e744ef58663..037ed2c637d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vhsubq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vhsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
> index b1ce3f07904..f51eb10ecbf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vhsubq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vhsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
> index 68872a8f900..24dd45db152 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vhsubq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vhsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vhsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
> index 03bd6d595cb..0f275d48753 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vhsubq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
> index 515acb84e66..21aeb9d2a59 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vhsubq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
> index 41fb2589924..b3ee94341b5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vhsubq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
> index dda18779dca..690ef2de5ba 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vhsubq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
> index 86a5576bedf..cfe12573fa0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vhsubq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
> index d339ca0e5e4..1926bc34219 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vhsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vhsubq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vhsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vhsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vhsub.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
> index 09da5c2f040..fcda4c541a6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
> index f3c032987bc..55637221f21 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
> index 1d86f7d72b3..ecfe188f3fa 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
> index df6b7ea427a..bf3d6c38b85 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a, mve_pred16_t p)
> +{
> +  return vhsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
> index bea6f2d1f96..4ae75b09950 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a, mve_pred16_t p)
> +{
> +  return vhsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
> index e1fafd7a9f5..edfa4216a31 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a, mve_pred16_t p)
> +{
> +  return vhsubq_x (a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
> index c9d3ffb45b7..bd2771b0978 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
> index 36343cffc85..0ea40df3d9e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
> index d1b134fe480..90ee94defb0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
> index 4da0fb3f340..d700741169a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
> index dfb0a6d371f..f43c9626829 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
> index d549892ef8b..a0908ba786b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vhsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vhsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vhsubq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 23/35] arm: improve tests for viwdupq*
  2022-11-17 16:37 ` [PATCH 23/35] arm: improve tests for viwdupq* Andrea Corallo
@ 2022-11-22 16:54   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:54 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 23/35] arm: improve tests for viwdupq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c: Improve tests.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/viwdupq_m_n_u16.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u32.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u8.c       | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u16.c     | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u32.c     | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u8.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_n_u16.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/viwdupq_n_u32.c        | 32 ++++++++++--
>  .../arm/mve/intrinsics/viwdupq_n_u8.c         | 28 ++++++++++-
>  .../arm/mve/intrinsics/viwdupq_wb_u16.c       | 36 ++++++++++---
>  .../arm/mve/intrinsics/viwdupq_wb_u32.c       | 36 ++++++++++---
>  .../arm/mve/intrinsics/viwdupq_wb_u8.c        | 36 ++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u16.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u32.c      | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u8.c       | 46 ++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u16.c     | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u32.c     | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u8.c      | 50 ++++++++++++++++---
>  18 files changed, 658 insertions(+), 106 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> index 0f999cc672b..67a2465f435 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_n_u16 (inactive, a, b, 2, p);
> +  return viwdupq_m_n_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 2, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
> index f79c91eaf4c..9fc2518acc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_n_u32 (inactive, a, b, 4, p);
> +  return viwdupq_m_n_u32 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 4, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
> index c0fee9fa752..39f4071bfa1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_n_u8 (inactive, a, b, 8, p);
> +  return viwdupq_m_n_u8 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 8, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
> index 468ba179f62..8bb680e0d77 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_wb_u16 (inactive, a, b, 2, p);
> +  return viwdupq_m_wb_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 2, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
> index e9190302717..2dc8d5f3442 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_wb_u32 (inactive, a, b, 4, p);
> +  return viwdupq_m_wb_u32 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 4, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
> index 309ce95a333..ff3a5f520e8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_wb_u8 (inactive, a, b, 8, p);
> +  return viwdupq_m_wb_u8 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 8, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
> index 599d9078464..5f37290759a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, uint32_t b)
>  {
> -  return viwdupq_n_u16 (a, b, 2);
> +  return viwdupq_n_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, uint32_t b)
>  {
> -  return viwdupq_u16 (a, b, 2);
> +  return viwdupq_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return viwdupq_u16 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
> index 7c2af74b3f0..de93f8a7ec4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, uint32_t b)
>  {
> -  return viwdupq_n_u32 (a, b, 4);
> +  return viwdupq_n_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, uint32_t b)
>  {
> -  return viwdupq_u32 (a, b, 4);
> +  return viwdupq_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return viwdupq_u32 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
> index 4ff60791f3b..089025c3401 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, uint32_t b)
>  {
>    return viwdupq_n_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, uint32_t b)
>  {
>    return viwdupq_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return viwdupq_u8 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
> index 1e5ce88dcca..fc3e9c6fac4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint32_t * a, uint32_t b)
> +foo (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_wb_u16 (a, b, 4);
> +  return viwdupq_wb_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint32_t * a, uint32_t b)
> +foo1 (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_u16 (a, b, 4);
> +  return viwdupq_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 ()
> +{
> +  return viwdupq_u16 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
> index 0c076f7b751..4c098dd8f02 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32_t * a, uint32_t b)
> +foo (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_wb_u32 (a, b, 8);
> +  return viwdupq_wb_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32_t * a, uint32_t b)
> +foo1 (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_u32 (a, b, 8);
> +  return viwdupq_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 ()
> +{
> +  return viwdupq_u32 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
> index 9e5118ba2b6..44cb53fe344 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint32_t * a, uint32_t b)
> +foo (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_wb_u8 (a, b, 2);
> +  return viwdupq_wb_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint32_t * a, uint32_t b)
> +foo1 (uint32_t *a, uint32_t b)
>  {
> -  return viwdupq_u8 (a, b, 2);
> +  return viwdupq_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "viwdup.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	viwdup.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 ()
> +{
> +  return viwdupq_u8 (1, 1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
> index fdaf6be282d..2242877881f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_n_u16 (a, b, 2, p);
> +  return viwdupq_x_n_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u16 (a, b, 2, p);
> +  return viwdupq_x_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u16 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
> index affc6162015..4b2b650e21a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_n_u32 (a, b, 4, p);
> +  return viwdupq_x_n_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u32 (a, b, 4, p);
> +  return viwdupq_x_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u32 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
> index 8137c623c2a..873952b6c2e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_n_u8 (a, b, 8, p);
> +  return viwdupq_x_n_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u8 (a, b, 8, p);
> +  return viwdupq_x_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u8 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
> index d7aa141f384..b6c94797380 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_wb_u16 (a, b, 8, p);
> +  return viwdupq_x_wb_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u16 (a, b, 8, p);
> +  return viwdupq_x_u16 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u16	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u16 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
> index 7fe56963452..5fd84963d01 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_wb_u32 (a, b, 2, p);
> +  return viwdupq_x_wb_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u32 (a, b, 2, p);
> +  return viwdupq_x_u32 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u32	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u32 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
> index 8e3ecefdedb..abbb40fa8da 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_wb_u8 (a, b, 4, p);
> +  return viwdupq_x_wb_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint32_t * a, uint32_t b, mve_pred16_t p)
> +foo1 (uint32_t *a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_x_u8 (a, b, 4, p);
> +  return viwdupq_x_u8 (a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	viwdupt.u8	q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (mve_pred16_t p)
> +{
> +  return viwdupq_x_u8 (1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 24/35] arm: improve tests for vmladavaq*
  2022-11-17 16:37 ` [PATCH 24/35] arm: improve tests for vmladavaq* Andrea Corallo
@ 2022-11-22 16:54   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:54 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 24/35] arm: improve tests for vmladavaq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c: Improve tests.
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmladavaq_p_s16.c      | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaq_p_s32.c      | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaq_p_s8.c       | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaq_p_u16.c      | 49 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmladavaq_p_u32.c      | 49 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmladavaq_p_u8.c       | 49 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s16.c     | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s32.c     | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s8.c      | 33 ++++++++++---
>  .../arm/mve/intrinsics/vmladavaxq_s16.c       | 24 ++++++---
>  .../arm/mve/intrinsics/vmladavaxq_s32.c       | 24 ++++++---
>  .../arm/mve/intrinsics/vmladavaxq_s8.c        | 24 ++++++---
>  12 files changed, 336 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> index e458204c41b..f3e5eba3b08 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s16 (a, b, c, p);
> +  return vmladavaq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> index e3544787adb..71f6957bfc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s32 (a, b, c, p);
> +  return vmladavaq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo1 (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
> index 1d4ca722f44..a74317aeff9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
> +foo (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s8 (a, b, c, p);
> +  return vmladavaq_p_s8 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
> +foo1 (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
> index 91a11c8b8b1..9ac84d46a07 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
> +foo (uint32_t add, uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_u16 (a, b, c, p);
> +  return vmladavaq_p_u16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo1 (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
> +foo1 (uint32_t add, uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
> +{
> +  return vmladavaq_p (1, m1, m2, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
> index 0efe8d0902f..4a3d109ed90 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +foo (uint32_t add, uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_u32 (a, b, c, p);
> +  return vmladavaq_p_u32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo1 (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +foo1 (uint32_t add, uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
> +{
> +  return vmladavaq_p (1, m1, m2, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
> index a8da9b0d2ef..a17440f4675 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c
> @@ -1,22 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
> +foo (uint32_t add, uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_u8 (a, b, c, p);
> +  return vmladavaq_p_u8 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32_t
> -foo1 (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
> +foo1 (uint32_t add, uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavat.u8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint32_t
> +foo2 (uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
> +{
> +  return vmladavaq_p (1, m1, m2, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
> index 838717e3e43..f201d5fa047 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p_s16 (a, b, c, p);
> +  return vmladavaxq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p (a, b, c, p);
> +  return vmladavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
> -/* { dg-final { scan-assembler "vmladavaxt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
> index a50c5ecf802..c90647a5064 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p_s32 (a, b, c, p);
> +  return vmladavaxq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo1 (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p (a, b, c, p);
> +  return vmladavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
> -/* { dg-final { scan-assembler "vmladavaxt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
> index e4705cecad9..57af7bc1c78 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
> +foo (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p_s8 (a, b, c, p);
> +  return vmladavaxq_p_s8 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmladavaxt.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
> +foo1 (int32_t add, int8x16_t m1, int8x16_t m2, mve_pred16_t p)
>  {
> -  return vmladavaxq_p (a, b, c, p);
> +  return vmladavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
> -/* { dg-final { scan-assembler "vmladavaxt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
> index ffd542a062f..684580d1c36 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmladavax.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int16x8_t b, int16x8_t c)
> +foo (int32_t add, int16x8_t m1, int16x8_t m2)
>  {
> -  return vmladavaxq_s16 (a, b, c);
> +  return vmladavaxq_s16 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmladavax.s16	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int16x8_t b, int16x8_t c)
> +foo1 (int32_t add, int16x8_t m1, int16x8_t m2)
>  {
> -  return vmladavaxq (a, b, c);
> +  return vmladavaxq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
> index b91e54d79e6..5d152647b55 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmladavax.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int32x4_t b, int32x4_t c)
> +foo (int32_t add, int32x4_t m1, int32x4_t m2)
>  {
> -  return vmladavaxq_s32 (a, b, c);
> +  return vmladavaxq_s32 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmladavax.s32	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int32x4_t b, int32x4_t c)
> +foo1 (int32_t add, int32x4_t m1, int32x4_t m2)
>  {
> -  return vmladavaxq (a, b, c);
> +  return vmladavaxq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
> index 61949c416fc..71bcdc9b55e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmladavax.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo (int32_t a, int8x16_t b, int8x16_t c)
> +foo (int32_t add, int8x16_t m1, int8x16_t m2)
>  {
> -  return vmladavaxq_s8 (a, b, c);
> +  return vmladavaxq_s8 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmladavax.s8	(?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32_t
> -foo1 (int32_t a, int8x16_t b, int8x16_t c)
> +foo1 (int32_t add, int8x16_t m1, int8x16_t m2)
>  {
> -  return vmladavaxq (a, b, c);
> +  return vmladavaxq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavax.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*
  2022-11-17 16:37 ` [PATCH 25/35] arm: improve tests and fix vmlaldavaxq* Andrea Corallo
@ 2022-11-22 16:56   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:56 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vmlaldavaq_<supf><mode>)
> 	(mve_vmlaldavaxq_s<mode>, mve_vmlaldavaxq_p_<supf><mode>):
> Fix
> 	spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Improve
> tests.
> 	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  6 ++--
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c    | 32 +++++++++++++++----
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c    | 32 +++++++++++++++----
>  .../arm/mve/intrinsics/vmlaldavaxq_s16.c      | 24 ++++++++++----
>  .../arm/mve/intrinsics/vmlaldavaxq_s32.c      | 24 ++++++++++----
>  5 files changed, 91 insertions(+), 27 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 714dc6fc7ce..d2ffae6a425 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -4163,7 +4163,7 @@ (define_insn "mve_vmlaldavaq_<supf><mode>"
>  	 VMLALDAVAQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vmlaldava.<supf>%#<V_sz_elem> %Q0, %R0, %q2, %q3"
> +  "vmlaldava.<supf>%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -4179,7 +4179,7 @@ (define_insn "mve_vmlaldavaxq_s<mode>"
>  	 VMLALDAVAXQ_S))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vmlaldavax.s%#<V_sz_elem> %Q0, %R0, %q2, %q3"
> +  "vmlaldavax.s%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -6126,7 +6126,7 @@ (define_insn
> "mve_vmlaldavaxq_p_<supf><mode>"
>  	 VMLALDAVAXQ_P))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vmlaldavaxt.<supf>%#<V_sz_elem> %Q0, %R0, %q2, %q3"
> +  "vpst\;vmlaldavaxt.<supf>%#<V_sz_elem>\t%Q0, %R0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> index f33d3880236..87f0354a636 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlaldavaxt.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
> -foo (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s16 (a, b, c, p);
> +  return vmlaldavaxq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlaldavaxt.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
> -foo1 (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p (a, b, c, p);
> +  return vmlaldavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> index ab072a9850e..d26bf5b90af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlaldavaxt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
> -foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s32 (a, b, c, p);
> +  return vmlaldavaxq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlaldavaxt.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
> -foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo1 (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p (a, b, c, p);
> +  return vmlaldavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
> index e68fbd2df94..3a37e7a58a9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlaldavax.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  int64_t
> -foo (int64_t a, int16x8_t b, int16x8_t c)
> +foo (int64_t add, int16x8_t m1, int16x8_t m2)
>  {
> -  return vmlaldavaxq_s16 (a, b, c);
> +  return vmlaldavaxq_s16 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavax.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlaldavax.s16	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  int64_t
> -foo1 (int64_t a, int16x8_t b, int16x8_t c)
> +foo1 (int64_t add, int16x8_t m1, int16x8_t m2)
>  {
> -  return vmlaldavaxq (a, b, c);
> +  return vmlaldavaxq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavax.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
> index 7b6fea289da..155b8be70f0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlaldavax.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  int64_t
> -foo (int64_t a, int32x4_t b, int32x4_t c)
> +foo (int64_t add, int32x4_t m1, int32x4_t m2)
>  {
> -  return vmlaldavaxq_s32 (a, b, c);
> +  return vmlaldavaxq_s32 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavax.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlaldavax.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:
> 	@.*|)
> +**	...
> +*/
>  int64_t
> -foo1 (int64_t a, int32x4_t b, int32x4_t c)
> +foo1 (int64_t add, int32x4_t m1, int32x4_t m2)
>  {
> -  return vmlaldavaxq (a, b, c);
> +  return vmlaldavaxq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavax.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 26/35] arm: improve tests for vmlasq*
  2022-11-17 16:38 ` [PATCH 26/35] arm: improve tests for vmlasq* Andrea Corallo
@ 2022-11-22 16:56   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:56 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 26/35] arm: improve tests for vmlasq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmlasq_m_n_s16.c       | 34 ++++++++++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s32.c       | 34 ++++++++++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s8.c        | 34 ++++++++++---
>  .../arm/mve/intrinsics/vmlasq_m_n_u16.c       | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmlasq_m_n_u32.c       | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmlasq_m_n_u8.c        | 50 ++++++++++++++++---
>  .../arm/mve/intrinsics/vmlasq_n_s16.c         | 24 ++++++---
>  .../arm/mve/intrinsics/vmlasq_n_s32.c         | 24 ++++++---
>  .../arm/mve/intrinsics/vmlasq_n_s8.c          | 24 ++++++---
>  .../arm/mve/intrinsics/vmlasq_n_u16.c         | 36 ++++++++++---
>  .../arm/mve/intrinsics/vmlasq_n_u32.c         | 36 ++++++++++---
>  .../arm/mve/intrinsics/vmlasq_n_u8.c          | 36 ++++++++++---
>  12 files changed, 348 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> index bf66e616ec7..af6e588adad 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s16 (a, b, c, p);
> +  return vmlasq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> index 53c21e2e5b6..9d0cc3076d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s32 (a, b, c, p);
> +  return vmlasq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
> index ac08b15fdbe..772ad8b1e76 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s8 (a, b, c, p);
> +  return vmlasq_m_n_s8 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
> index 99f1e28c7d5..b02dc64a31b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint16x8_t a, uint16x8_t b, uint16_t c, mve_pred16_t p)
> +foo (uint16x8_t m1, uint16x8_t m2, uint16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_u16 (a, b, c, p);
> +  return vmlasq_m_n_u16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint16x8_t a, uint16x8_t b, uint16_t c, mve_pred16_t p)
> +foo1 (uint16x8_t m1, uint16x8_t m2, uint16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t m1, uint16x8_t m2, mve_pred16_t p)
> +{
> +  return vmlasq_m (m1, m2, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
> index 8d8edca6024..0214cf2136e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32x4_t a, uint32x4_t b, uint32_t c, mve_pred16_t p)
> +foo (uint32x4_t m1, uint32x4_t m2, uint32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_u32 (a, b, c, p);
> +  return vmlasq_m_n_u32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32x4_t a, uint32x4_t b, uint32_t c, mve_pred16_t p)
> +foo1 (uint32x4_t m1, uint32x4_t m2, uint32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t m1, uint32x4_t m2, mve_pred16_t p)
> +{
> +  return vmlasq_m (m1, m2, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
> index e7f685bbcaa..c9824e332f7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint8x16_t a, uint8x16_t b, uint8_t c, mve_pred16_t p)
> +foo (uint8x16_t m1, uint8x16_t m2, uint8_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_u8 (a, b, c, p);
> +  return vmlasq_m_n_u8 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint8x16_t a, uint8x16_t b, uint8_t c, mve_pred16_t p)
> +foo1 (uint8x16_t m1, uint8x16_t m2, uint8_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vmlast.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t m1, uint8x16_t m2, mve_pred16_t p)
> +{
> +  return vmlasq_m (m1, m2, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
> index 8bfe3c31096..6708a741790 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add)
>  {
> -  return vmlasq_n_s16 (a, b, c);
> +  return vmlasq_n_s16 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
> index db06182abec..4e8bf32e016 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add)
>  {
> -  return vmlasq_n_s32 (a, b, c);
> +  return vmlasq_n_s32 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
> index 3a151650ef4..1cb1a31459c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c)
> +foo (int8x16_t m1, int8x16_t m2, int8_t add)
>  {
> -  return vmlasq_n_s8 (a, b, c);
> +  return vmlasq_n_s8 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c)
> +foo1 (int8x16_t m1, int8x16_t m2, int8_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
> index b9444f2f6a3..e03c91ef298 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo (uint16x8_t a, uint16x8_t b, uint16_t c)
> +foo (uint16x8_t m1, uint16x8_t m2, uint16_t add)
>  {
> -  return vmlasq_n_u16 (a, b, c);
> +  return vmlasq_n_u16 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
> -foo1 (uint16x8_t a, uint16x8_t b, uint16_t c)
> +foo1 (uint16x8_t m1, uint16x8_t m2, uint16_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmlas.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t m1, uint16x8_t m2)
> +{
> +  return vmlasq (m1, m2, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
> index 5708a0658a6..b80c3c7631f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo (uint32x4_t a, uint32x4_t b, uint32_t c)
> +foo (uint32x4_t m1, uint32x4_t m2, uint32_t add)
>  {
> -  return vmlasq_n_u32 (a, b, c);
> +  return vmlasq_n_u32 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
> -foo1 (uint32x4_t a, uint32x4_t b, uint32_t c)
> +foo1 (uint32x4_t m1, uint32x4_t m2, uint32_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmlas.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t m1, uint32x4_t m2)
> +{
> +  return vmlasq (m1, m2, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
> index d83940c7232..0f37550160e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo (uint8x16_t a, uint8x16_t b, uint8_t c)
> +foo (uint8x16_t m1, uint8x16_t m2, uint8_t add)
>  {
> -  return vmlasq_n_u8 (a, b, c);
> +  return vmlasq_n_u8 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
> -foo1 (uint8x16_t a, uint8x16_t b, uint8_t c)
> +foo1 (uint8x16_t m1, uint8x16_t m2, uint8_t add)
>  {
> -  return vmlasq (a, b, c);
> +  return vmlasq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vmlas.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmlas.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t m1, uint8x16_t m2)
> +{
> +  return vmlasq (m1, m2, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 27/35] arm: improve tests for vqaddq_m*
  2022-11-17 16:38 ` [PATCH 27/35] arm: improve tests for vqaddq_m* Andrea Corallo
@ 2022-11-22 16:57   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:57 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 27/35] arm: improve tests for vqaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqaddq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqaddq_m_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_s16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_s32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_s8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_u16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_u32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_m_u8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqaddq_n_s16.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_n_s32.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_n_s8.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_n_u16.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqaddq_n_u32.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqaddq_n_u8.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqaddq_s16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_s32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_s8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_u16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vqaddq_u32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_u8.c | 16 ++++++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> index 65d3f770fe2..a659373d441 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
> index 4499a0eaa41..8ffc6a67762 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
> index d3e1d555cb1..2e88b7fabac 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
> index baadfe72e8d..61cf9fcf2aa 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vqaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
> index 80808777d9a..bbd255ac1f1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vqaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
> index 32f2894422d..9cee8c65333 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vqaddq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
> index d5b7fa63f6a..8bb8a957423 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
> index 015bc3eb206..9959724fc11 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
> index b241fddd069..6b918978880 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
> index fa752355d64..c0a8d9ba9c8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
> index 0729b6bb30f..7a72ce57840 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
> index f1541658399..f7e6ca9b5a4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqaddt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
> index 5eeda2bc2dd..0fac7abeac0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vqaddq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
> index 5b914d18b98..d750b1f2c14 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vqaddq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
> index 06f22c2b8df..5fc796edf75 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vqaddq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
> index 5403f0b6646..decad65c188 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vqaddq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqadd.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vqaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
> index 77185808a16..b0a6d79093e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vqaddq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqadd.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vqaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
> index f0fa9bf3f5d..f9ca9a1f042 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vqaddq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqadd.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vqaddq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
> index 83cd3475a6f..ffa31463372 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vqaddq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
> index d26dd206912..c5937a967ff 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vqaddq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
> index de03264b4cc..9f937512811 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vqaddq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
> index cd4efc1dd7c..aa4be43f244 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vqaddq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
> index 8b3afb4bd04..daef60eb5ca 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vqaddq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
> index da2ff1bb25c..e28807ec708 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vqaddq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqadd.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vqaddq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqadd.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 28/35] arm: improve tests for vqdmlahq_m*
  2022-11-17 16:38 ` [PATCH 28/35] arm: improve tests for vqdmlahq_m* Andrea Corallo
@ 2022-11-22 16:57   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:57 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 28/35] arm: improve tests for vqdmlahq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill


> ---
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c     | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c     | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c      | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlahq_n_s16.c       | 24 +++++++++----
>  .../arm/mve/intrinsics/vqdmlahq_n_s32.c       | 24 +++++++++----
>  .../arm/mve/intrinsics/vqdmlahq_n_s8.c        | 24 +++++++++----
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c    | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c    | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c     | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqdmlashq_n_s16.c      | 24 +++++++++----
>  .../arm/mve/intrinsics/vqdmlashq_n_s32.c      | 24 +++++++++----
>  .../arm/mve/intrinsics/vqdmlashq_n_s8.c       | 24 +++++++++----
>  12 files changed, 264 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> index d8c4f4bab8e..94d93874542 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s16 (a, b, c, p);
> +  return vqdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m (a, b, c, p);
> +  return vqdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> index 361f5d00bdf..a3dab7fa02e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s32 (a, b, c, p);
> +  return vqdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m (a, b, c, p);
> +  return vqdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
> index a9eaea89ba4..610580478a3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s8 (a, b, c, p);
> +  return vqdmlahq_m_n_s8 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo1 (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m (a, b, c, p);
> +  return vqdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
> index c109dd47444..210bacec2fb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2)
>  {
> -  return vqdmlahq_n_s16 (a, b, c);
> +  return vqdmlahq_n_s16 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2)
>  {
> -  return vqdmlahq (a, b, c);
> +  return vqdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
> index 752d9d9e3e0..dbb2494b216 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2)
>  {
> -  return vqdmlahq_n_s32 (a, b, c);
> +  return vqdmlahq_n_s32 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2)
>  {
> -  return vqdmlahq (a, b, c);
> +  return vqdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
> index 8dffa0e1852..a7962f82d38 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c)
> +foo (int8x16_t add, int8x16_t m1, int8_t m2)
>  {
> -  return vqdmlahq_n_s8 (a, b, c);
> +  return vqdmlahq_n_s8 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c)
> +foo1 (int8x16_t add, int8x16_t m1, int8_t m2)
>  {
> -  return vqdmlahq (a, b, c);
> +  return vqdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlah.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
> index 7c2e5cf89dd..34d407f0142 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m_n_s16 (a, b, c, p);
> +  return vqdmlashq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m (a, b, c, p);
> +  return vqdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
> index cea9d9b683f..50a665ea7e5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m_n_s32 (a, b, c, p);
> +  return vqdmlashq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m (a, b, c, p);
> +  return vqdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
> index 83ee258876a..45f34b60382 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m_n_s8 (a, b, c, p);
> +  return vqdmlashq_m_n_s8 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vqdmlashq_m (a, b, c, p);
> +  return vqdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlasht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
> index c71a61c54f6..a3f1ae8d6b8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlash.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add)
>  {
> -  return vqdmlashq_n_s16 (a, b, c);
> +  return vqdmlashq_n_s16 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlash.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add)
>  {
> -  return vqdmlashq (a, b, c);
> +  return vqdmlashq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
> index 61f6c6671cc..cf867e56874 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlash.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add)
>  {
> -  return vqdmlashq_n_s32 (a, b, c);
> +  return vqdmlashq_n_s32 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlash.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add)
>  {
> -  return vqdmlashq (a, b, c);
> +  return vqdmlashq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
> index a07892863c1..7e9362cab60 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmlash.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c)
> +foo (int8x16_t m1, int8x16_t m2, int8_t add)
>  {
> -  return vqdmlashq_n_s8 (a, b, c);
> +  return vqdmlashq_n_s8 (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmlash.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c)
> +foo1 (int8x16_t m1, int8x16_t m2, int8_t add)
>  {
> -  return vqdmlashq (a, b, c);
> +  return vqdmlashq (m1, m2, add);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmlash.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 29/35] arm: improve tests for vqdmul*
  2022-11-17 16:38 ` [PATCH 29/35] arm: improve tests for vqdmul* Andrea Corallo
@ 2022-11-22 16:58   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 16:58 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 29/35] arm: improve tests for vqdmul*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Improve
> tests.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c     | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c     | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c      | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_m_s16.c       | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_m_s32.c       | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_m_s8.c        | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulhq_n_s16.c       | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s32.c       | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s8.c        | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulhq_s16.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulhq_s32.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulhq_s8.c          | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c    | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c    | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmullbq_m_s16.c      | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmullbq_m_s32.c      | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmullbq_n_s16.c      | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmullbq_n_s32.c      | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmullbq_s16.c        | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmullbq_s32.c        | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c    | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c    | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulltq_m_s16.c      | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulltq_m_s32.c      | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vqdmulltq_n_s16.c      | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulltq_n_s32.c      | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulltq_s16.c        | 16 ++++++++++--
>  .../arm/mve/intrinsics/vqdmulltq_s32.c        | 16 ++++++++++--
>  28 files changed, 504 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> index 57ab85eaf52..a5c1a106205 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
> index 256353a0a21..c78d4db1591 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
> index c24be9ed5ad..b5ab6eb292c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
> index 49efeefcf63..2f5fb0e53a4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
> index a5614830622..80a938a8a5b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
> index 2e016f57e35..bfb755af4ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulht.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqdmulhq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
> index 19534b60b27..e34689d203d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vqdmulhq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
> index eff9f6ecc4b..f967b8a286a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vqdmulhq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
> index 188cf7c616f..5e1928fd51b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vqdmulhq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
> index 513a30f67e6..7c0a434e48f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vqdmulhq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
> index 9cf147dc7c5..19f4b03f6f0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vqdmulhq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
> index 87211ad054a..1784c967f3c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmulh.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vqdmulhq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmulh.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vqdmulhq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmulh.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
> index f0a4ad5b9f4..4f96e192732 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
> index 1c7b2e4a1fc..d0bca6e3015 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
> index 6a056cf86a1..8448cdc88cf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
> index 019c536e7f2..48cddcd791e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmullbt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmullbq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmullbt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
> index ec501c34539..cd7c394139d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullb.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vqdmullbq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullb.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vqdmullbq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
> index 78fe3d6b289..b4d82f55987 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullb.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vqdmullbq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullb.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vqdmullbq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
> index 9a423d3cc66..6f0fdabf67f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullb.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vqdmullbq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullb.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vqdmullbq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
> index f0278cd8a86..2bf952bfd77 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullb.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vqdmullbq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullb.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vqdmullbq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullb.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
> index 85f03149da4..6c756ebf3e7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
> index 6bb5004e201..e46f6b2c384 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int64x2_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
> index a85393b5bc1..8526b3ad628 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
> index 82f25b2ebbe..809e0740e46 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqdmulltt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int64x2_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqdmulltq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmulltt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
> index f9ad32a8411..44f0036bc51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vqdmulltq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vqdmulltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
> index 311b023431e..b025886ff15 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vqdmulltq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vqdmulltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
> index 851f27a63b6..95084876349 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vqdmulltq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vqdmulltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
> index 1e81cc3dea5..ab27aeddc29 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqdmullt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vqdmulltq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqdmullt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vqdmulltq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqdmullt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 30/35] arm: improve tests for vqrdmlahq*
  2022-11-17 16:38 ` [PATCH 30/35] arm: improve tests for vqrdmlahq* Andrea Corallo
@ 2022-11-22 17:01   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:01 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 30/35] arm: improve tests for vqrdmlahq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c: Improve
> test.
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s16.c    | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s32.c    | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s8.c     | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqrdmlahq_n_s16.c      | 24 +++++++++----
>  .../arm/mve/intrinsics/vqrdmlahq_n_s32.c      | 24 +++++++++----
>  .../arm/mve/intrinsics/vqrdmlahq_n_s8.c       | 24 +++++++++----
>  6 files changed, 132 insertions(+), 42 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> index 70c3fa0e9b1..07d689279ac 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s16 (a, b, c, p);
> +  return vqrdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> index 75ed9911276..3b02ca16038 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s32 (a, b, c, p);
> +  return vqrdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> index ddaea545f40..b661bdcb4cf 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s8 (a, b, c, p);
> +  return vqrdmlahq_m_n_s8 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlaht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo1 (int8x16_t add, int8x16_t m1, int8_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
> index 45e74971838..16804735b32 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqrdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2)
>  {
> -  return vqrdmlahq_n_s16 (a, b, c);
> +  return vqrdmlahq_n_s16 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqrdmlah.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2)
>  {
> -  return vqrdmlahq (a, b, c);
> +  return vqrdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
> index 79bb9c98b12..d7d3dc06d7f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqrdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2)
>  {
> -  return vqrdmlahq_n_s32 (a, b, c);
> +  return vqrdmlahq_n_s32 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqrdmlah.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2)
>  {
> -  return vqrdmlahq (a, b, c);
> +  return vqrdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
> index 220518ae698..d3f9f25f11c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqrdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c)
> +foo (int8x16_t add, int8x16_t m1, int8_t m2)
>  {
> -  return vqrdmlahq_n_s8 (a, b, c);
> +  return vqrdmlahq_n_s8 (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqrdmlah.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c)
> +foo1 (int8x16_t add, int8x16_t m1, int8_t m2)
>  {
> -  return vqrdmlahq (a, b, c);
> +  return vqrdmlahq (add, m1, m2);
>  }
> 
> -/* { dg-final { scan-assembler "vqrdmlah.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*
  2022-11-17 16:38 ` [PATCH 31/35] arm: improve tests for vqrdmlashq_m* Andrea Corallo
@ 2022-11-22 17:02   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:02 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c:

Missing ChangeLog entries.
Ok with that fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s16.c   | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s32.c   | 34 ++++++++++++++-----
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s8.c    | 34 ++++++++++++++-----
>  3 files changed, 78 insertions(+), 24 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> index 35b9618ca47..da4d724bb46 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s16 (a, b, c, p);
> +  return vqrdmlashq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> index 8517835eb61..2430f1cb102 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s32 (a, b, c, p);
> +  return vqrdmlashq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> index e42cc63fa74..30915b24e5e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s8 (a, b, c, p);
> +  return vqrdmlashq_m_n_s8 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqrdmlasht.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
> -foo1 (int8x16_t a, int8x16_t b, int8_t c, mve_pred16_t p)
> +foo1 (int8x16_t m1, int8x16_t m2, int8_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 32/35] arm: improve tests for vqsubq*
  2022-11-17 16:38 ` [PATCH 32/35] arm: improve tests for vqsubq* Andrea Corallo
@ 2022-11-22 17:03   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:03 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 32/35] arm: improve tests for vqsubq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_s16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_s32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_s8.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_u16.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_u32.c:
> 	* gcc.target/arm/mve/intrinsics/vqsubq_u8.c:

Missing text.
Ok with ChangeLog fixed.
Kyrill

> ---
>  .../arm/mve/intrinsics/vqsubq_m_n_s16.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s32.c       | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s8.c        | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_n_u16.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_n_u32.c       | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_n_u8.c        | 42 +++++++++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_s16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_s32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_s8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_u16.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_u32.c         | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_m_u8.c          | 26 ++++++++++--
>  .../arm/mve/intrinsics/vqsubq_n_s16.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_n_s32.c         | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_n_s8.c          | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_n_u16.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqsubq_n_u32.c         | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqsubq_n_u8.c          | 28 ++++++++++++-
>  .../arm/mve/intrinsics/vqsubq_s16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_s32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_u16.c           | 16 ++++++-
>  .../arm/mve/intrinsics/vqsubq_u32.c           | 16 ++++++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c | 16 ++++++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> index abcff4f0e3c..39b8089919d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> index 23e59ff12a2..ed6b92ddcf5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
> index d783ab55f65..c69ed2aeb84 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
> index 5244efb340c..57ba7428bef 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, uint16x8_t a, mve_pred16_t p)
> +{
> +  return vqsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
> index 4427f87f456..eda9e74309d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t inactive, uint32x4_t a, mve_pred16_t p)
> +{
> +  return vqsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
> index 0abfa5dc132..f6f61b52f52 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_n_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t inactive, uint8x16_t a, mve_pred16_t p)
> +{
> +  return vqsubq_m (inactive, a, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
> index faa189f8466..1a8ea29e83e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
> index 62a4dd0979f..c49b7497f6d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
> index 71fb6f5632e..17d6471bcd9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
> index 68d642dfef5..0ce93fdf9be 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
> index 8f76c5f47da..1eac57545b3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
> index af335ae9752..56bdda2da6e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vqsubt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
>  {
>    return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
> index 33a79180289..b9a46f5ff6f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16_t b)
>  {
>    return vqsubq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
> index a2b338839fa..732e6c01b78 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vqsubq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
> index e8d7e99d19d..fb3c4404fba 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8_t b)
>  {
>    return vqsubq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
> index f7b48c546a6..aa09d1831e0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16_t b)
>  {
>    return vqsubq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u16"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqsub.u16	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t a)
> +{
> +  return vqsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
> index f74a968f5a7..19b62e3a8a5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32_t b)
>  {
>    return vqsubq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqsub.u32	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t a)
> +{
> +  return vqsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
> index ce7b4ce0151..c8eeb38b266 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c
> @@ -1,21 +1,45 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8_t b)
>  {
>    return vqsubq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u8"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vqsub.u8	q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t a)
> +{
> +  return vqsubq (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
> index 85bf265eeb0..6c66b4d75d8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vqsubq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
> index 35d17e8bc4e..8432197b9e8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vqsubq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
> index 50cfccff7a5..ad16cae08bc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vqsubq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
> index 15f0b7244b7..264df1a0398 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, uint16x8_t b)
>  {
>    return vqsubq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, uint16x8_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
> index 7d695e23474..a4bf15cd9df 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, uint32x4_t b)
>  {
>    return vqsubq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, uint32x4_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
> index c0552d100d4..1804d6484e2 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vqsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, uint8x16_t b)
>  {
>    return vqsubq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vqsub.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, uint8x16_t b)
>  {
>    return vqsubq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vqsub.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*
  2022-11-17 16:38 ` [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq* Andrea Corallo
@ 2022-11-22 17:03   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:03 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*
> 
> gcc/ChangeLog:
> 
> 	* config/arm/mve.md (mve_vrmlaldavhq_<supf>v4si,
> 	mve_vrmlaldavhaq_<supf>v4si): Fix spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c: Improve
> test.
> 	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md                         |  4 +-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_s32.c   | 24 ++++++++++-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_u32.c   | 40 ++++++++++++++++++-
>  3 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index d2ffae6a425..b5e6da4b133 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -2543,7 +2543,7 @@ (define_insn "mve_vrmlaldavhq_<supf>v4si"
>  	 VRMLALDAVHQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vrmlaldavh.<supf>32 %Q0, %R0, %q1, %q2"
> +  "vrmlaldavh.<supf>32\t%Q0, %R0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -2649,7 +2649,7 @@ (define_insn "mve_vrmlaldavhaq_<supf>v4si"
>  	 VRMLALDAVHAQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vrmlaldavha.<supf>32 %Q0, %R0, %q2, %q3"
> +  "vrmlaldavha.<supf>32\t%Q0, %R0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>  ])
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> index 263d3509771..dec4a969dfe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrmlaldavhat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>    return vrmlaldavhaq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrmlaldavhat.s32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int64_t
>  foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>    return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> index 83ab68c001b..f3c8bfd121c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>    return vrmlaldavhaq_p_u32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint64_t
>  foo1 (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>    return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> +/*
> +**foo2:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrmlaldavhat.u32	(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:	@.*|)
> +**	...
> +*/
> +uint64_t
> +foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +{
> +  return vrmlaldavhaq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 34/35] arm: improve tests for vrshlq*
  2022-11-17 16:38 ` [PATCH 34/35] arm: improve tests for vrshlq* Andrea Corallo
@ 2022-11-22 17:04   ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:04 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 34/35] arm: improve tests for vrshlq*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Improve tests.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vrshlq_m_n_s16.c       | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s32.c       | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s8.c        | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u16.c       | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u32.c       | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u8.c        | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_s16.c         | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_s32.c         | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_s8.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_u16.c         | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_u32.c         | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_m_u8.c          | 26 ++++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_n_s16.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_n_s32.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_n_s8.c          | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_n_u16.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_n_u32.c         | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_n_u8.c          | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_s16.c           | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_s32.c           | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_u16.c           | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_u32.c           | 16 ++++++++++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c | 16 ++++++++++--
>  .../arm/mve/intrinsics/vrshlq_x_s16.c         | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_x_s32.c         | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_x_s8.c          | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_x_u16.c         | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_x_u32.c         | 25 +++++++++++++++---
>  .../arm/mve/intrinsics/vrshlq_x_u8.c          | 25 +++++++++++++++---
>  30 files changed, 564 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> index cf51de6aa9c..c7d1f3a5b1c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
> index dcfd99773e3..a8713e6a06a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
> index cc1b746dc0d..8160d1bdb04 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
> index 93a95ba9065..b08f4c076d1 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
> index 4b8c82aba21..59f9a13d8c0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
> index f1ff9dd33b7..fda65f7c592 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int32_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_n (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
> index 57f343cd3b9..20c9f5fcd7c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
> index 2598b1719fd..af7a5158458 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_s32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
> index 6e4f1bdddf4..59d283ebb71 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_s8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
> index d4d98913b75..e731cb71675 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_u16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
> index 5d60f1fe799..0379e0455c9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_u32 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
> index 913ba36c925..1e20486253e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_m_u8 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
> index 713c6a218b2..c846e9f06ee 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int32_t b)
>  {
>    return vrshlq_n_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
> index 18906fe44d1..1c6144212f7 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32_t b)
>  {
>    return vrshlq_n_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
> index d5b1286d943..3b9d0a389dc 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int32_t b)
>  {
>    return vrshlq_n_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
> index 49bb21663d7..77994bd3a29 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int32_t b)
>  {
>    return vrshlq_n_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u16	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
> index 8ed67395b42..82774c794fe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32_t b)
>  {
>    return vrshlq_n_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u32	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
> index ccc6a00b98a..e9badb7297e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int32_t b)
>  {
>    return vrshlq_n_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u8	q[0-9]+, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int32_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
> index c28ad31c6f9..4a64fc7b410 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b)
>  {
>    return vrshlq_s16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
> index 2e279b6fb0a..c5cbe266c0f 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b)
>  {
>    return vrshlq_s32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
> index 4d18419d1bf..85305921f9a 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_s8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b)
>  {
>    return vrshlq_s8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.s8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
> index e0a9ea9cebc..905a18c4f20 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b)
>  {
>    return vrshlq_u16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
> index 788a4b1b6fa..16c7578df39 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u32.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b)
>  {
>    return vrshlq_u32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
> index d860e9cccb9..8bf21eeaef5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_u8.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vrshl.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b)
>  {
>    return vrshlq_u8 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vrshl.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b)
>  {
>    return vrshlq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vrshl.u8"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
> index 800a1e8e48f..4dfb6a65842 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_s16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
> index 921072a44c9..7f1f6dbb760 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_s32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
> index 217b257ed24..69bf0a50fa6 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_s8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.s8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.s8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
> index 5c0cad9ec89..b5a89892070 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_u16 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u16	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t a, int16x8_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
> index 2754d20841c..59ab2662021 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_u32 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u32	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo1 (uint32x4_t a, int32x4_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
> index 46dada44559..b81d8d03da4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_x_u8 (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vrshlt.u8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmsr	p0, (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +**	vpst(?:	@.*|)
> +**	...
> +**	vrshlt.u8	q[0-9]+, q[0-9]+, q[0-9]+(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo1 (uint8x16_t a, int8x16_t b, mve_pred16_t p)
>  {
>    return vrshlq_x (a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 35/35] arm: improve tests for vsetq_lane*
  2022-11-17 16:38 ` [PATCH 35/35] arm: improve tests for vsetq_lane* Andrea Corallo
@ 2022-11-22 17:06   ` Kyrylo Tkachov
  2022-11-24 14:43     ` [PATCH 35/35 V2] " Andrea Corallo
  0 siblings, 1 reply; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-22 17:06 UTC (permalink / raw)
  To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, Andrea Corallo



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Andrea Corallo <Andrea.Corallo@arm.com>
> Subject: [PATCH 35/35] arm: improve tests for vsetq_lane*
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
> ---
>  .../arm/mve/intrinsics/vsetq_lane_f16.c       | 36 +++++++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_f32.c       | 36 +++++++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_s16.c       | 24 ++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_s32.c       | 24 ++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_s64.c       | 27 ++++++++++---
>  .../arm/mve/intrinsics/vsetq_lane_s8.c        | 24 ++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_u16.c       | 36 +++++++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_u32.c       | 36 +++++++++++++++--
>  .../arm/mve/intrinsics/vsetq_lane_u64.c       | 39 ++++++++++++++++---
>  .../arm/mve/intrinsics/vsetq_lane_u8.c        | 36 +++++++++++++++--
>  10 files changed, 284 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> index e03e9620528..b5c9f4d5eb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float16x8_t
>  foo (float16_t a, float16x8_t b)
>  {
> -    return vsetq_lane_f16 (a, b, 0);
> +  return vsetq_lane_f16 (a, b, 1);
>  }
> 

Hmm, for these tests we should be able to scan for more specific codegen as we're setting individual lanes, so we should be able to scan for lane 1 in the vmov instruction, though it may need to be flipped for big-endian.
Thanks,
Kyrill

> -/* { dg-final { scan-assembler "vmov.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo1 (float16_t a, float16x8_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float16x8_t
> +foo2 (float16x8_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> index 2b9f1a7e627..211083ce5d4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  float32x4_t
>  foo (float32_t a, float32x4_t b)
>  {
> -    return vsetq_lane_f32 (a, b, 0);
> +  return vsetq_lane_f32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo1 (float32_t a, float32x4_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +float32x4_t
> +foo2 (float32x4_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> index 92ad0dd16a8..9cdaeae1e74 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> @@ -1,15 +1,33 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int16x8_t
>  foo (int16_t a, int16x8_t b)
>  {
> -    return vsetq_lane_s16 (a, b, 0);
> +  return vsetq_lane_s16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int16x8_t
> +foo1 (int16_t a, int16x8_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
> index e60c8f26700..edd06bce1bd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
> @@ -1,15 +1,33 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int32x4_t
>  foo (int32_t a, int32x4_t b)
>  {
> -    return vsetq_lane_s32 (a, b, 0);
> +  return vsetq_lane_s32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int32x4_t
> +foo1 (int32_t a, int32x4_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
> index 430df669f2a..95ba4da1f51 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
> @@ -1,16 +1,33 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
> -/* { dg-require-effective-target arm_hard_ok } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
> -/* { dg-additional-options "-mfloat-abi=hard -O2" } */
> +/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int64x2_t
>  foo (int64_t a, int64x2_t b)
>  {
> -    return vsetq_lane_s64 (a, b, 0);
> +  return vsetq_lane_s64 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int64x2_t
> +foo1 (int64_t a, int64x2_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
> index d8ccbb524fd..f5bf0dd663b 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
> @@ -1,15 +1,33 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  int8x16_t
>  foo (int8_t a, int8x16_t b)
>  {
> -    return vsetq_lane_s8 (a, b, 0);
> +  return vsetq_lane_s8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +int8x16_t
> +foo1 (int8_t a, int8x16_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
> index 156a5d1de1b..33944dcbd45 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint16x8_t
>  foo (uint16_t a, uint16x8_t b)
>  {
> -    return vsetq_lane_u16 (a, b, 0);
> +  return vsetq_lane_u16 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.16"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo1 (uint16_t a, uint16x8_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t b)
> +{
> +  return vsetq_lane (1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
> index e9575483cc9..8f9a3a78cc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint32x4_t
>  foo (uint32_t a, uint32x4_t b)
>  {
> -    return vsetq_lane_u32 (a, b, 0);
> +  return vsetq_lane_u32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.32"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo1 (uint32_t a, uint32x4_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov.32	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint32x4_t
> +foo2 (uint32x4_t b)
> +{
> +  return vsetq_lane (1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
> index 0e040121cf0..5ce4c544c25 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
> @@ -1,16 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
> -/* { dg-require-effective-target arm_hard_ok } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
> -/* { dg-additional-options "-mfloat-abi=hard -O2" } */
> +/* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint64x2_t
>  foo (uint64_t a, uint64x2_t b)
>  {
> -    return vsetq_lane_u64 (a, b, 0);
> +  return vsetq_lane_u64 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint64x2_t
> +foo1 (uint64_t a, uint64x2_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint64x2_t
> +foo2 (uint64x2_t b)
> +{
> +  return vsetq_lane (1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
> index 668b3fea953..d37021c91b0 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**	...
> +**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
>  uint8x16_t
>  foo (uint8_t a, uint8x16_t b)
>  {
> -    return vsetq_lane_u8 (a, b, 0);
> +  return vsetq_lane_u8 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.8"  }  } */
> 
> +/*
> +**foo1:
> +**	...
> +**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo1 (uint8_t a, uint8x16_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**	...
> +**	vmov.8	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> +**	...
> +*/
> +uint8x16_t
> +foo2 (uint8x16_t b)
> +{
> +  return vsetq_lane (1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
  2022-11-22 17:06   ` Kyrylo Tkachov
@ 2022-11-24 14:43     ` Andrea Corallo
  2022-11-24 15:28       ` Kyrylo Tkachov
  0 siblings, 1 reply; 82+ messages in thread
From: Andrea Corallo @ 2022-11-24 14:43 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: gcc-patches, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:

[...]

>> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> index e03e9620528..b5c9f4d5eb8 100644
>> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> @@ -1,15 +1,45 @@
>> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
>>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>>  /* { dg-add-options arm_v8_1m_mve_fp } */
>>  /* { dg-additional-options "-O2" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> 
>>  #include "arm_mve.h"
>> 
>> +/*
>> +**foo:
>> +**	...
>> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
>> +**	...
>> +*/
>>  float16x8_t
>>  foo (float16_t a, float16x8_t b)
>>  {
>> -    return vsetq_lane_f16 (a, b, 0);
>> +  return vsetq_lane_f16 (a, b, 1);
>>  }
>> 
>
> Hmm, for these tests we should be able to scan for more specific codegen as we're setting individual lanes, so we should be able to scan for lane 1 in the vmov instruction, though it may need to be flipped for big-endian.
> Thanks,
> Kyrill

Hi Kyrill,

please find attached the updated version of this patch.

Big-endian should not be a problem as for my understanding is just not
supported with MVE intrinsics.

Thanks!

  Andrea


[-- Attachment #2: 0001-arm-improve-tests-for-vsetq_lane.patch --]
[-- Type: text/plain, Size: 14336 bytes --]

From 79f2c990553a1f793e08b9a0c4abb7dae8de7120 Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Thu, 17 Nov 2022 11:06:29 +0100
Subject: [PATCH] arm: improve tests for vsetq_lane*

gcc/testsuite/ChangeLog:

	* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vsetq_lane_f16.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_f32.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s16.c       | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s32.c       | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_s64.c       | 27 ++++++++++---
 .../arm/mve/intrinsics/vsetq_lane_s8.c        | 24 ++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u16.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u32.c       | 36 +++++++++++++++--
 .../arm/mve/intrinsics/vsetq_lane_u64.c       | 39 ++++++++++++++++---
 .../arm/mve/intrinsics/vsetq_lane_u8.c        | 36 +++++++++++++++--
 10 files changed, 284 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
index e03e9620528..6b148a4b03d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float16x8_t
 foo (float16_t a, float16x8_t b)
 {
-    return vsetq_lane_f16 (a, b, 0);
+  return vsetq_lane_f16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo1 (float16_t a, float16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float16x8_t
+foo2 (float16x8_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
index 2b9f1a7e627..e4e7f892e97 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 float32x4_t
 foo (float32_t a, float32x4_t b)
 {
-    return vsetq_lane_f32 (a, b, 0);
+  return vsetq_lane_f32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo1 (float32_t a, float32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+float32x4_t
+foo2 (float32x4_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
index 92ad0dd16a8..950cd016b76 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int16x8_t
 foo (int16_t a, int16x8_t b)
 {
-    return vsetq_lane_s16 (a, b, 0);
+  return vsetq_lane_s16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int16x8_t
+foo1 (int16_t a, int16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
index e60c8f26700..6b49ccd91e4 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int32x4_t
 foo (int32_t a, int32x4_t b)
 {
-    return vsetq_lane_s32 (a, b, 0);
+  return vsetq_lane_s32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int32x4_t
+foo1 (int32_t a, int32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
index 430df669f2a..95ba4da1f51 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
@@ -1,16 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
-/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-mfloat-abi=hard -O2" } */
+/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int64x2_t
 foo (int64_t a, int64x2_t b)
 {
-    return vsetq_lane_s64 (a, b, 0);
+  return vsetq_lane_s64 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int64x2_t
+foo1 (int64_t a, int64x2_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
index d8ccbb524fd..91a5baee55f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
@@ -1,15 +1,33 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.8	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 int8x16_t
 foo (int8_t a, int8x16_t b)
 {
-    return vsetq_lane_s8 (a, b, 0);
+  return vsetq_lane_s8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.8	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+int8x16_t
+foo1 (int8_t a, int8x16_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
index 156a5d1de1b..53986a5c1b1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint16x8_t
 foo (uint16_t a, uint16x8_t b)
 {
-    return vsetq_lane_u16 (a, b, 0);
+  return vsetq_lane_u16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo1 (uint16_t a, uint16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.16	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint16x8_t
+foo2 (uint16x8_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
index e9575483cc9..3f17db9623a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint32x4_t
 foo (uint32_t a, uint32x4_t b)
 {
-    return vsetq_lane_u32 (a, b, 0);
+  return vsetq_lane_u32 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.32"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo1 (uint32_t a, uint32x4_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.32	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint32x4_t
+foo2 (uint32x4_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
index 0e040121cf0..5ce4c544c25 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
@@ -1,16 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
-/* { dg-require-effective-target arm_hard_ok } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
-/* { dg-additional-options "-mfloat-abi=hard -O2" } */
+/* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint64x2_t
 foo (uint64_t a, uint64x2_t b)
 {
-    return vsetq_lane_u64 (a, b, 0);
+  return vsetq_lane_u64 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint64x2_t
+foo1 (uint64_t a, uint64x2_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov	d[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint64x2_t
+foo2 (uint64x2_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
index 668b3fea953..58e932b85e8 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+**	...
+**	vmov.8	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
 uint8x16_t
 foo (uint8_t a, uint8x16_t b)
 {
-    return vsetq_lane_u8 (a, b, 0);
+  return vsetq_lane_u8 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.8"  }  } */
 
+/*
+**foo1:
+**	...
+**	vmov.8	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo1 (uint8_t a, uint8x16_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+**	...
+**	vmov.8	q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:	@.*|)
+**	...
+*/
+uint8x16_t
+foo2 (uint8x16_t b)
+{
+  return vsetq_lane (1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* RE: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
  2022-11-24 14:43     ` [PATCH 35/35 V2] " Andrea Corallo
@ 2022-11-24 15:28       ` Kyrylo Tkachov
  0 siblings, 0 replies; 82+ messages in thread
From: Kyrylo Tkachov @ 2022-11-24 15:28 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: gcc-patches, Richard Earnshaw



> -----Original Message-----
> From: Andrea Corallo <andrea.corallo@arm.com>
> Sent: Thursday, November 24, 2022 2:44 PM
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> <Richard.Earnshaw@arm.com>
> Subject: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
> 
> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
> 
> [...]
> 
> >> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> index e03e9620528..b5c9f4d5eb8 100644
> >> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> @@ -1,15 +1,45 @@
> >> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} }
> */
> >>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> >>  /* { dg-add-options arm_v8_1m_mve_fp } */
> >>  /* { dg-additional-options "-O2" } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >>
> >>  #include "arm_mve.h"
> >>
> >> +/*
> >> +**foo:
> >> +**	...
> >> +**	vmov.16	q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?:	@.*|)
> >> +**	...
> >> +*/
> >>  float16x8_t
> >>  foo (float16_t a, float16x8_t b)
> >>  {
> >> -    return vsetq_lane_f16 (a, b, 0);
> >> +  return vsetq_lane_f16 (a, b, 1);
> >>  }
> >>
> >
> > Hmm, for these tests we should be able to scan for more specific codegen
> as we're setting individual lanes, so we should be able to scan for lane 1 in
> the vmov instruction, though it may need to be flipped for big-endian.
> > Thanks,
> > Kyrill
> 
> Hi Kyrill,
> 
> please find attached the updated version of this patch.
> 
> Big-endian should not be a problem as for my understanding is just not
> supported with MVE intrinsics.

Huh, that's right.
This version is ok.
Thanks!
Kyrill

> 
> Thanks!
> 
>   Andrea


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk)
  2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
                   ` (34 preceding siblings ...)
  2022-11-17 16:38 ` [PATCH 35/35] arm: improve tests for vsetq_lane* Andrea Corallo
@ 2022-11-28  9:20 ` Andrea Corallo
  35 siblings, 0 replies; 82+ messages in thread
From: Andrea Corallo @ 2022-11-28  9:20 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: gcc-patches, Richard.Earnshaw, stam.markianos-wright

Andrea Corallo <andrea.corallo@arm.com> writes:

> Hi all,
>
> this is the first patch series about improving the current MVE
> implementation and testsuite for:
>
> - Complete intrinsic implementation and coverage (the list of intrinsics is
>   specified by [1])
> - Verifying all instructions supposedly emitted by each intrinsic
> - Verifying register usage
> - Fixing the current scan assemblers to really match the wanted mnemonics
> - Verifying no external calls are emitted
>
> This series fixes the backend where necessary.
>
> Best Regards
>
>   Andrea

Hi Kyrill,

thank for reviewing the series!

With the requested changes this is now into trunk as of f2b54e5b796.

Best Regards

  Andrea


^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2022-11-28  9:20 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 16:37 [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo
2022-11-17 16:37 ` [PATCH 01/35] arm: improve vcreateq* tests Andrea Corallo
2022-11-18  9:47   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization Andrea Corallo
2022-11-18 16:33   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 03/35] arm: improve tests and fix vddupq* Andrea Corallo
2022-11-18 16:34   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 04/35] arm: improve tests and fix vdwdupq* Andrea Corallo
2022-11-18 16:35   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 05/35] arm: improve vidupq* tests Andrea Corallo
2022-11-18 16:36   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 06/35] arm: improve tests and fix vdupq* Andrea Corallo
2022-11-18 16:37   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 07/35] arm: improve tests and fix vcmp* Andrea Corallo
2022-11-18 16:40   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 08/35] arm: improve tests for vmin* Andrea Corallo
2022-11-18 16:41   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 09/35] arm: improve tests for vmax* Andrea Corallo
2022-11-18 16:42   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 10/35] arm: improve tests for vabavq* Andrea Corallo
2022-11-18 16:43   ` Kyrylo Tkachov
2022-11-21 14:49     ` Andrea Corallo
2022-11-17 16:37 ` [PATCH 11/35] arm: improve tests for vabdq* Andrea Corallo
2022-11-18 16:44   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 12/35] arm: improve tests and fix vabsq* Andrea Corallo
2022-11-18 16:45   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic Andrea Corallo
2022-11-18 16:49   ` Kyrylo Tkachov
2022-11-21 10:45     ` Stam Markianos-Wright
2022-11-17 16:37 ` [PATCH 14/35] arm: propagate fixed overloading of MVE intrinsic scalar parameters Andrea Corallo
2022-11-18 16:51   ` Kyrylo Tkachov
2022-11-21 10:46     ` Stam Markianos-Wright
2022-11-17 16:37 ` [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] Andrea Corallo
2022-11-18 16:58   ` Kyrylo Tkachov
2022-11-20 22:49     ` Ramana Radhakrishnan
2022-11-21 14:11       ` Stam Markianos-Wright
2022-11-21 10:45     ` Stam Markianos-Wright
2022-11-17 16:37 ` [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic Andrea Corallo
2022-11-22 10:00   ` Christophe Lyon
2022-11-22 10:54     ` Andrea Corallo
2022-11-22 16:48   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 17/35] arm: improve tests and fix vadd* Andrea Corallo
2022-11-22 16:49   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 18/35] arm: improve tests for vmulq* Andrea Corallo
2022-11-22 16:51   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 19/35] arm: improve tests and fix vsubq* Andrea Corallo
2022-11-22 16:51   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 20/35] arm: improve tests for vfmasq_m* Andrea Corallo
2022-11-22 16:52   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 21/35] arm: improve tests for vhaddq_m* Andrea Corallo
2022-11-22 16:53   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 22/35] arm: improve tests for vhsubq_m* Andrea Corallo
2022-11-22 16:53   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 23/35] arm: improve tests for viwdupq* Andrea Corallo
2022-11-22 16:54   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 24/35] arm: improve tests for vmladavaq* Andrea Corallo
2022-11-22 16:54   ` Kyrylo Tkachov
2022-11-17 16:37 ` [PATCH 25/35] arm: improve tests and fix vmlaldavaxq* Andrea Corallo
2022-11-22 16:56   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 26/35] arm: improve tests for vmlasq* Andrea Corallo
2022-11-22 16:56   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 27/35] arm: improve tests for vqaddq_m* Andrea Corallo
2022-11-22 16:57   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 28/35] arm: improve tests for vqdmlahq_m* Andrea Corallo
2022-11-22 16:57   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 29/35] arm: improve tests for vqdmul* Andrea Corallo
2022-11-22 16:58   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 30/35] arm: improve tests for vqrdmlahq* Andrea Corallo
2022-11-22 17:01   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 31/35] arm: improve tests for vqrdmlashq_m* Andrea Corallo
2022-11-22 17:02   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 32/35] arm: improve tests for vqsubq* Andrea Corallo
2022-11-22 17:03   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq* Andrea Corallo
2022-11-22 17:03   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 34/35] arm: improve tests for vrshlq* Andrea Corallo
2022-11-22 17:04   ` Kyrylo Tkachov
2022-11-17 16:38 ` [PATCH 35/35] arm: improve tests for vsetq_lane* Andrea Corallo
2022-11-22 17:06   ` Kyrylo Tkachov
2022-11-24 14:43     ` [PATCH 35/35 V2] " Andrea Corallo
2022-11-24 15:28       ` Kyrylo Tkachov
2022-11-28  9:20 ` [PATCH 00/35] arm: rework MVE testsuite and rework backend where necessary (1st chunk) Andrea Corallo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).