Re: [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Charles Baylis <charles.baylis@linaro.org>
To: Tejas Belagod <tejas.belagod@arm.com>
Cc: Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
		"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
Date: Wed, 08 Oct 2014 18:47:00 -0000	[thread overview]
Message-ID: <CADnVucDTz0zz4jOQr+QF4Xondnby_=RPtnWXED3cC-cRdBvc5g@mail.gmail.com> (raw)
In-Reply-To: <54256053.6050405@arm.com>

On 26 September 2014 13:47, Tejas Belagod <tejas.belagod@arm.com> wrote:
> If we use type-punning, there are unnecessary spills that are generated
> which is also incorrect for BE because of of the way we spill (st1 {v0.16b -
> v1.16b}, [sp]) and restore. The implementation without type-punning seems to
> give a more optimal result. Did your patches improve on the spills for the
> type-punning solution?

OK, this part seems too contentious, so I've respun the vldN_lane
parts without the type punning and reposted them. This issue can be
resolved separately.

Trying an example like this gives good code with type punning, and
poor code without.

void t2(int32_t *p)
{
    int32x4x4_t va = vld4q_s32(p);
    va = vld4q_lane_s32(p + 500, va, 1);
    vst4q_s32(p+1000, va);
}

With type-punning, good code:
t2:
        ld4     {v0.4s - v3.4s}, [x0]
        add     x2, x0, 2000
        add     x1, x0, 4000
        ld4     {v0.s - v3.s}[1], [x2]
        st4     {v0.4s - v3.4s}, [x1]
        ret

Without type-punning, horrible code:
t2:
        ld4     {v0.4s - v3.4s}, [x0]
        sub     sp, sp, #64
        add     x14, x0, 2000
        add     x0, x0, 4000
        umov    x12, v0.d[0]
        umov    x13, v0.d[1]
        umov    x10, v1.d[0]
        umov    x11, v1.d[1]
        umov    x8, v2.d[0]
        str     x12, [sp]
        umov    x9, v2.d[1]
        str     x13, [sp, 8]
        str     q3, [sp, 48]
        str     x10, [sp, 16]
        str     x11, [sp, 24]
        str     x8, [sp, 32]
        str     x9, [sp, 40]
        ld1     {v0.16b - v3.16b}, [sp]
        ld4     {v0.s - v3.s}[1], [x14]
        umov    x10, v0.d[0]
        umov    x11, v0.d[1]
        umov    x8, v1.d[0]
        umov    x9, v1.d[1]
        umov    x6, v2.d[0]
        str     x10, [sp]
        umov    x7, v2.d[1]
        str     x11, [sp, 8]
        str     q3, [sp, 48]
        str     x8, [sp, 16]
        str     x9, [sp, 24]
        str     x6, [sp, 32]
        str     x7, [sp, 40]
        ld1     {v0.16b - v3.16b}, [sp]
        add     sp, sp, 64
        st4     {v0.4s - v3.4s}, [x0]
        ret

>> Maybe the solution is to pass the NEON
>> intrinsic types directly to the builtins? Is there a reason that it
>> wasn't done that way before?
>
> How do you mean? Do you mean pass a loaded value int32x2x2_t into a
> __builtin? How will that work?
>
> If you mean why we don't pass an int32x2x2_t into a builtin as a structure,
> I don't think that would work as it is struct type which would correspond to
> a  BLK mode, but we need RTL patterns with reg-lists to work with large int
> modes for the regalloc to allocate consecutive regs for the reglists.

OK, that makes sense. However, something needs to be done to create
the __arch64_simd_ objects without register moves. Since the existing
mechanism causes problems because the lifetimes of the inputs overlap
with the lifetimes of the outputs, I think there are these options:

1. represent the construction/deconstruction as a single operation, to
avoid overlapping variable liveness in the source.
2. add a pass or peephole which can combine the existing builtins into
a single operation, so that the lifetimes are normalised.
3. teach the register allocator how to handle overlapping liveness of
a register and a subreg of that register.

Option 1 would require a new builtin interface which somehow handled a
whole int32x2x2_t in one operation. Construction is easy
(__builtin_aarch64_simd_construct(v.val[0], v.val[1]) or similar).
Deconstruction is less obvious

Option 2 sounds like a hack, but would probably be effective,
particularly if it can be done before inlining.

Option 3 would also help with poor code generation for ARM targets
with vget_low_*, vget_high_* and vcombine_*.

What do you think is the best approach?

Thanks
Charles

next prev parent reply	other threads:[~2014-10-08 18:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-18 19:40 [PATCH 0/4] [AARCH64,NEON] Improve various NEON load/store intrinsics Charles Baylis
2014-09-18 19:40 ` [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_* Charles Baylis
2014-09-19 11:21   ` Tejas Belagod
2014-09-26  1:16     ` Charles Baylis
2014-09-26 12:47       ` Tejas Belagod
2014-10-08 18:47         ` Charles Baylis [this message]
2014-09-18 19:41 ` [PATCH 3/4] [AARCH64,NEON] Fix unnecessary moves in vld[234]q_* intrinsics Charles Baylis
2014-09-18 19:41 ` [PATCH 4/4] [AARCH64,NEON] Fix unnecessary moves in vst[234]q_* intrinsics Charles Baylis
2014-09-18 19:41 ` [PATCH 1/4] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics Charles Baylis
2014-09-19  8:40   ` Kyrill Tkachov
2014-09-19 10:46   ` Tejas Belagod
2014-09-24 16:36     ` Charles Baylis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADnVucDTz0zz4jOQr+QF4Xondnby_=RPtnWXED3cC-cRdBvc5g@mail.gmail.com' \
    --to=charles.baylis@linaro.org \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=tejas.belagod@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).