Hi, Recent refactoring of the arm_neon.h header enabled better code generation for intrinsics that manipulate vector structures. New tests were also added to verify the benefit of these changes. It now transpires that the code generation improvements are observed only on little-endian systems. This patch restricts the code generation tests to little-endian targets (for now.) Ok for master? Thanks, Jonathan --- gcc/testsuite/ChangeLog: 2021-08-04  Jonathan Wright   * gcc.target/aarch64/vector_structure_intrinsics.c: Restrict tests to little-endian targets. From: Christophe Lyon Sent: 03 August 2021 10:42 To: Jonathan Wright Cc: gcc-patches@gcc.gnu.org ; Richard Sandiford Subject: Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics   On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches wrote: Hi, This patch uses __builtin_memcpy to copy vector structures instead of building a new opaque structure one vector at a time in each of the vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file and also improves code generation - superfluous move instructions were emitted for every register extraction/set in this additional structure. Add new code generation tests to verify that superfluous move instructions are no longer generated for the vqtbl[234] intrinsics. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-08  Jonathan Wright           * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy         instead of constructing __builtin_aarch64_simd_oi one vector         at a time.         (vqtbl2_u8): Likewise.         (vqtbl2_p8): Likewise.         (vqtbl2q_s8): Likewise.         (vqtbl2q_u8): Likewise.         (vqtbl2q_p8): Likewise.         (vqtbl3_s8): Use __builtin_memcpy instead of constructing         __builtin_aarch64_simd_ci one vector at a time.         (vqtbl3_u8): Likewise.         (vqtbl3_p8): Likewise.         (vqtbl3q_s8): Likewise.         (vqtbl3q_u8): Likewise.         (vqtbl3q_p8): Likewise.         (vqtbl4_s8): Use __builtin_memcpy instead of constructing         __builtin_aarch64_simd_xi one vector at a time.         (vqtbl4_u8): Likewise.         (vqtbl4_p8): Likewise.         (vqtbl4q_s8): Likewise.         (vqtbl4q_u8): Likewise.         (vqtbl4q_p8): Likewise. gcc/testsuite/ChangeLog:         * gcc.target/aarch64/vector_structure_intrinsics.c: New test. Hi, This new test fails on aarch64_be:  FAIL: gcc.target/aarch64/vector_structure_intrinsics.c scan-assembler-not mov\\t Can you check? Thanks Christophe