On Wed, 22 Sep 2010 23:54:38 +0100
Richard Earnshaw <Richard.Earnshaw@buzzard.freeserve.co.uk> wrote:

> On 03/08/10 17:32, Julian Brown wrote:
> > On Mon, 7 Jun 2010 20:08:48 +0100
> > Julian Brown <julian@codesourcery.com> wrote:
> > 
> >>>>> This is a new version of the patch, which adds movmisalign
> >>>>> patterns for little-endian NEON, and uses a new (since the last
> >>>>> version of the patch was posted) target hook
> >>>>> (TARGET_SUPPORT_VECTOR_MISALIGNMENT) to describe the alignments
> >>>>> supported by NEON.
> > 
> > The previously-posted version of this patch no longer works on
> > current mainline, so here's a new version which does.
> > 
> > Backing up to the start of the problem, since it's been a while --
> > this patch adds several things to NEON support in the ARM backend:
> > 
> > 1. Implementations of the movmisalign pattern for loading and
> > storing vectors which are not naturally aligned.
> > 
> > 2. Constraint/operand printing tweaks to disallow pre-decrement for 
> > addresses used by the above, and allow printing of alignment
> > specifiers for same.
> > 
> > 3. Implementations of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> > and TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE, to tell the
> > middle-end which alignments are supported for vector loads/stores
> > by the hardware.
> > 
> > 4. Testsuite tweaks to specify that certain tests only require
> > vectors to be aligned to the natural alignment of their elements,
> > but not necessarily less than that. Also tweaks to force some tests
> > to use the -mvectorize-with-neon-quad option.
> > 
> >> [...] there's still an assumption that elements from increasing
> >> memory locations go in increasing lane numbers (which is only true
> >> in little-endian mode for NEON at present), but I don't think this
> >> patch makes things any worse. Fixing big-endian mode is another
> >> problem for another day :-).
> > 
> > This still holds, but as previously discussed, probably should not
> > be a sticking point for getting this patch applied.
> > 
> > There remains a small amount of noise in testsuite results with this
> > patch, i.e.:
> > 
> > PASS -> FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/ve
> > ct/vect-72.c scan-tree-dump-times vect "Alignment of access forced
> > using peeling " 0
> > 
> > This fails because a loop containing both an unaligned load and an
> > unaligned store is unpeeled, making the load aligned. It seems to
> > be a valid thing to do, so I'm not sure why it's a failure.
> > 
> > New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc
> > scan-tree-dump-times vect "vectorized 1 loops" 1 New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/g++.sum:g++.dg/vect/pr36648.cc
> > scan-tree-dump-times vect "vectorizing stmts using SLP" 1
> > 
> > These were analysed in:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01351.html
> > 
> > New FAIL:
> > mthumb-march_armv7-a-mfpu_neon-mfloat-abi_softfp/gcc.sum:gcc.dg/vect/vect-outer-4c.c
> > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> > 
> > and this in:
> > 
> >   http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01328.html
> > 
> > Also several tests transition from XPASS to PASS.
> > 
> > Tested with cross to ARM Linux (-mthumb -march=armv7-a -mfpu=neon
> > -mfloat-abi=softfp), gcc/g++/libstdc++. OK to apply?
> > 
> > ChangeLog
> > 
> >     gcc/
> >     * expr.c (expand_assignment): Add assertion to prevent emitting
> > null rtx for movmisalign pattern.
> >     (expand_expr_real_1): Likewise.
> >     * config/arm/arm.c (arm_builtin_support_vector_misalignment):
> > New. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above.
> >     (arm_vector_alignment_reachable): New.
> >     (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above.
> >     (neon_vector_mem_operand): Disallow PRE_DEC for misaligned
> > loads. (arm_print_operand): Include alignment qualifier in %A.
> >     * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant.
> >     (movmisalign<mode>): New expander.
> >     (movmisalign<mode>_neon_store, movmisalign<mode>_neon_load): New
> >     insn patterns.
> > 
> >     gcc/testsuite/
> >     * gcc.dg/vect/vect-42.c: Use vect_element_align instead of
> >     vect_hw_misalign.
> >     * gcc.dg/vect/vect-60.c: Likewise.
> >     * gcc.dg/vect/vect-56.c: Likewise.
> >     * gcc.dg/vect/vect-93.c: Likewise.
> >     * gcc.dg/vect/no-scevccp-outer-8.c: Likewise.
> >     * gcc.dg/vect/vect-95.c: Likewise.
> >     * gcc.dg/vect/vect-96.c: Likewise.
> >     * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when
> > available.
> >     * gcc.dg/vect/slp-25.c: Likewise.
> >     * gcc.dg/vect/slp-3.c: Likewise.
> >     * gcc.dg/vect/vect-multitypes-1.c: Likewise.
> >     * gcc.dg/vect/no-vfa-pr29145.c: Likewise.
> >     * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use
> > vect_element_align.
> >     * gcc.dg/vect/vect-109.c: Likewise.
> >     * gcc.dg/vect/vect-peel-1.c: Likewise.
> >     * gcc.dg/vect/vect-peel-2.c: Likewise.
> >     * lib/target-supports.exp
> >     (check_effective_target_arm_vect_no_misalign): New.
> >     (check_effective_target_vect_no_align): Use above.
> >     (check_effective_target_vect_element_align): New.
> >     (add_options_for_quad_vectors): New.
> 
> 
> I've spent a long time pondering this patch and I'm still not entirely
> happy that forcing the vectorizer to pretend these operations are
> unaligned is the correct way to specify this, but I must admit that I
> can't see a reasonable alternative at the moment that isn't
> significantly less pleasant in some other respect.  So this is OK
> apart from:
> 
> +	if (align_bits != 0)
> +	  asm_fprintf (stream, ", :%d", align_bits);
> 
> The comma is incorrect in the alignment syntax.  The correct form is
> [Rn:align].  That is, the ':' is a direct replacement for '@' in the
> strict UAL form.

Fixed.

Here's the version I'm about to commit, re-tested lightly. It only
differs in trivial ways from the previously-posted version, in order to
apply to current mainline.

Thanks,

Julian

ChangeLog

    gcc/
    * expr.c (expand_assignment): Add assertion to prevent emitting
    null rtx for movmisalign pattern.
    (expand_expr_real_1): Likewise.
    * config/arm/arm.c (arm_builtin_support_vector_misalignment): New.
    (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): New. Use above.
    (arm_vector_alignment_reachable): New.
    (TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE): New. Use above.
    (neon_vector_mem_operand): Disallow PRE_DEC for misaligned loads.
    (arm_print_operand): Include alignment qualifier in %A.
    * config/arm/neon.md (UNSPEC_MISALIGNED_ACCESS): New constant.
    (movmisalign<mode>): New expander.
    (movmisalign<mode>_neon_store, movmisalign<mode>_neon_load): New
    insn patterns.

    gcc/testsuite/
    * gcc.dg/vect/vect-42.c: Use vect_element_align instead of
    vect_hw_misalign.
    * gcc.dg/vect/vect-60.c: Likewise.
    * gcc.dg/vect/vect-56.c: Likewise.
    * gcc.dg/vect/vect-93.c: Likewise.
    * gcc.dg/vect/no-scevccp-outer-8.c: Likewise.
    * gcc.dg/vect/vect-95.c: Likewise.
    * gcc.dg/vect/vect-96.c: Likewise.
    * gcc.dg/vect/vect-outer-5.c: Use quad-word vectors when available.
    * gcc.dg/vect/slp-25.c: Likewise.
    * gcc.dg/vect/slp-3.c: Likewise.
    * gcc.dg/vect/vect-multitypes-1.c: Likewise.
    * gcc.dg/vect/no-vfa-pr29145.c: Likewise.
    * gcc.dg/vect/vect-multitypes-4.c: Likewise. Use vect_element_align.
    * gcc.dg/vect/vect-109.c: Likewise.
    * gcc.dg/vect/vect-peel-1.c: Likewise.
    * gcc.dg/vect/vect-peel-2.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_vect_no_misalign): New.
    (check_effective_target_vect_no_align): Use above.
    (check_effective_target_vect_element_align): New.
    (add_options_for_quad_vectors): New.