On Wed, 22 Sep 2010 23:54:38 +0100 Richard Earnshaw wrote: > On 03/08/10 17:32, Julian Brown wrote: > > On Mon, 7 Jun 2010 20:08:48 +0100 > > Julian Brown wrote: > > > >>>>> This is a new version of the patch, which adds movmisalign > >>>>> patterns for little-endian NEON, and uses a new (since the last > >>>>> version of the patch was posted) target hook > >>>>> (TARGET_SUPPORT_VECTOR_MISALIGNMENT) to describe the alignments > >>>>> supported by NEON. > > > > The previously-posted version of this patch no longer works on > > current mainline, so here's a new version which does. > > > > Backing up to the start of the problem, since it's been a while -- > > this patch adds several things to NEON support in the ARM backend: > > > > 1. Implementations of the movmisalign pattern for loading and > > storing vectors which are not naturally aligned. > > > > 2. Constraint/operand printing tweaks to disallow pre-decrement for > > addresses used by the above, and allow printing of alignment > > specifiers for same. > > > > 3. Implementations of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT > > and TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE, to tell the > > middle-end which alignments are supported for vector loads/stores > > by the hardware. > > > > 4. Testsuite tweaks to specify that certain tests only require > > vectors to be aligned to the natural alignment of their elements, > > but not necessarily less than that. Also tweaks to force some tests > > to use the -mvectorize-with-neon-quad option. > > > >> [...] there's still an assumption that elements from increasing > >> memory locations go in increasing lane numbers (which is only true > >> in little-endian mode for NEON at present), but I don't think this > >> patch makes things any worse. Fixing big-endian mode is another > >> problem for another day :-). > > > > This still holds, but as previously discussed, probably should not > > be a sticking point for getting this patch applied. > > > > There remains a small amount of noise in testsuite results with this > > patch, i.e.: > > > > PASS -> FAIL: > > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/ve > > ct/vect-72.c scan-tree-dump-times vect "Alignment of access forced > > using peeling " 0 > > > > This fails because a loop containing both an unaligned load and an > > unaligned store is unpeeled, making the load aligned. It seems to > > be a valid thing to do, so I'm not sure why it's a failure. > > > > New FAIL: > > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc > > scan-tree-dump-times vect "vectorized 1 loops" 1 New FAIL: > > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc > > scan-tree-dump-times vect "vectorizing stmts using SLP" 1 > > > > These were analysed in: > > > > http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01351.html > > > > New FAIL: > > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/vect/vect-outer-4c.c > > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 > > > > and this in: > > > > http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01328.html > > > > Also several tests transition from XPASS to PASS. > > > > Tested with cross to ARM Linux (-mthumb -march=armv7-a -mfpu=neon > > -mfloat-abi=softfp), gcc/g++/libstdc++. OK to apply? > > > > ChangeLog > > > > gcc/ > > * expr.c (expand_assignment): Add assertion to prevent emitting > > null rtx for movmisalign pattern. > > (expand_expr_real_1): Likewise. > > * config/arm/arm.c (arm_builtin_support_vector_misalignment): > > New. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above. > > (arm_vector_alignment_reachable): New. > > (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above. > > (neon_vector_mem_operand): Disallow PRE_DEC for misaligned > > loads. (arm_print_operand): Include alignment qualifier in %A. > > * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant. > > (movmisalign): New expander. > > (movmisalign_neon_store, movmisalign_neon_load): New > > insn patterns. > > > > gcc/testsuite/ > > * gcc.dg/vect/vect-42.c: Use vect_element_align instead of > > vect_hw_misalign. > > * gcc.dg/vect/vect-60.c: Likewise. > > * gcc.dg/vect/vect-56.c: Likewise. > > * gcc.dg/vect/vect-93.c: Likewise. > > * gcc.dg/vect/no-scevccp-outer-8.c: Likewise. > > * gcc.dg/vect/vect-95.c: Likewise. > > * gcc.dg/vect/vect-96.c: Likewise. > > * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when > > available. > > * gcc.dg/vect/slp-25.c: Likewise. > > * gcc.dg/vect/slp-3.c: Likewise. > > * gcc.dg/vect/vect-multitypes-1.c: Likewise. > > * gcc.dg/vect/no-vfa-pr29145.c: Likewise. > > * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use > > vect_element_align. > > * gcc.dg/vect/vect-109.c: Likewise. > > * gcc.dg/vect/vect-peel-1.c: Likewise. > > * gcc.dg/vect/vect-peel-2.c: Likewise. > > * lib/target-supports.exp > > (check_effective_target_arm_vect_no_misalign): New. > > (check_effective_target_vect_no_align): Use above. > > (check_effective_target_vect_element_align): New. > > (add_options_for_quad_vectors): New. > > > I've spent a long time pondering this patch and I'm still not entirely > happy that forcing the vectorizer to pretend these operations are > unaligned is the correct way to specify this, but I must admit that I > can't see a reasonable alternative at the moment that isn't > significantly less pleasant in some other respect. So this is OK > apart from: > > + if (align_bits != 0) > + asm_fprintf (stream, ", :%d", align_bits); > > The comma is incorrect in the alignment syntax. The correct form is > [Rn:align]. That is, the ':' is a direct replacement for '@' in the > strict UAL form. Fixed. Here's the version I'm about to commit, re-tested lightly. It only differs in trivial ways from the previously-posted version, in order to apply to current mainline. Thanks, Julian ChangeLog gcc/ * expr.c (expand_assignment): Add assertion to prevent emitting null rtx for movmisalign pattern. (expand_expr_real_1): Likewise. * config/arm/arm.c (arm_builtin_support_vector_misalignment): New. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above. (arm_vector_alignment_reachable): New. (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above. (neon_vector_mem_operand): Disallow PRE_DEC for misaligned loads. (arm_print_operand): Include alignment qualifier in %A. * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant. (movmisalign): New expander. (movmisalign_neon_store, movmisalign_neon_load): New insn patterns. gcc/testsuite/ * gcc.dg/vect/vect-42.c: Use vect_element_align instead of vect_hw_misalign. * gcc.dg/vect/vect-60.c: Likewise. * gcc.dg/vect/vect-56.c: Likewise. * gcc.dg/vect/vect-93.c: Likewise. * gcc.dg/vect/no-scevccp-outer-8.c: Likewise. * gcc.dg/vect/vect-95.c: Likewise. * gcc.dg/vect/vect-96.c: Likewise. * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when available. * gcc.dg/vect/slp-25.c: Likewise. * gcc.dg/vect/slp-3.c: Likewise. * gcc.dg/vect/vect-multitypes-1.c: Likewise. * gcc.dg/vect/no-vfa-pr29145.c: Likewise. * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use vect_element_align. * gcc.dg/vect/vect-109.c: Likewise. * gcc.dg/vect/vect-peel-1.c: Likewise. * gcc.dg/vect/vect-peel-2.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_vect_no_misalign): New. (check_effective_target_vect_no_align): Use above. (check_effective_target_vect_element_align): New. (add_options_for_quad_vectors): New.