Hi all, This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16 to extract lower or higher half from a bfloat16x8 vector. The vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 could be done by a 'dup' or 'mov', or it's mostly optimized out by just using the lower half of a vector register. The test for vget_low_bf16 only checks that the interface can be compiled but no instruction is checked since none is generated in the test case. Arm ACLE document at https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics Regtested and bootstrapped. Is it OK for trunk please? Thanks Denni gcc/ChangeLog: 2020-10-29 Dennis Zhang * config/aarch64/aarch64-simd-builtins.def (vget_half): New entry. * config/aarch64/aarch64-simd.md (aarch64_vget_halfv8bf): New entry. * config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic. (vget_high_bf16): Likewise. * config/aarch64/predicates.md (aarch64_zero_or_1): New predicate for zero or one immediate to indicate the lower or higher half. gcc/testsuite/ChangeLog 2020-10-29 Dennis Zhang * gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c (test_vget_low_bf16, test_vget_high_bf16): New tests.