Re: [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tejas Belagod <tejas.belagod@arm.com>
To: Charles Baylis <charles.baylis@linaro.org>
Cc: Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	 Richard Earnshaw <Richard.Earnshaw@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
Date: Fri, 26 Sep 2014 12:47:00 -0000	[thread overview]
Message-ID: <54256053.6050405@arm.com> (raw)
In-Reply-To: <CADnVucDex9oU9V-hPpFOhoX--DmmutDq4XuZUV7fBwLNFViEMw@mail.gmail.com>

On 26/09/14 02:16, Charles Baylis wrote:
> On 19 September 2014 12:21, Tejas Belagod <tejas.belagod@arm.com> wrote:
>> The reason we avoided using type-punning using unions was that reload would
>> get confused with potential subreg(mem) that could be introduced because of
>> memory xfer caused by unions and large int modes. As a result, we would get
>> incorrect or sub-optimal code. But this seems to have fixed itself. :-)
>>
>> Because this involves xfers between large int modes and
>> CANNOT_CHANGE_MODE_CLASS has some impact on it, it would be good to test
>> what impact your patch has with C_C_M_C removed, so that it will be easier
>> to fix the fallout once we remove C_C_M_C eventually. To test this you will
>> need Richard's patch set
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01440.html.
>>
>> Same for your other 2 patches in this series(3,4).
>
> I tried those patches, and altered aarch64_cannot_change_mode_class to
> return false for all cases.
>
> However, this does not avoid the unnecessary moves.
>
> Taking a really simple test case:
>
> #include <arm_neon.h>
>
> int32x2x2_t xvld2_s32(int32_t *__a)
> {
>    int32x2x2_t ret;
>    __builtin_aarch64_simd_oi __o;
>    __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a);
>    ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
>    ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
>    return ret;
> }
>
> (disabling scheduling for clarity)
> $ aarch64-oe-linux-gcc -O2 -S -o - simd.c -fno-schedule-insns
> -fno-schedule-insns2
>          ...
> xvld2_s32:
>          ld2     {v2.2s - v3.2s}, [x0]
>          orr     v0.8b, v2.8b, v2.8b
>          orr     v1.8b, v3.8b, v3.8b
>          ret
>          ...
>
>
> The reason is apparent in the rtl dump from ira:
> ...
>        Allocno a0r73 of FP_REGS(32) has 31 avail. regs  33-63, node:
> 33-63 (confl regs =  0-32 64 65)
> ...
> (insn 2 4 3 2 (set (reg/v/f:DI 79 [ __a ])
>          (reg:DI 0 x0 [ __a ])) simd.c:5 34 {*movdi_aarch64}
>       (expr_list:REG_DEAD (reg:DI 0 x0 [ __a ])
>          (nil)))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 20 2 (set (reg/v:OI 73 [ __o ])
>          (subreg:OI (vec_concat:V8SI (vec_concat:V4SI (unspec:V2SI [
>                              (mem:TI (reg/v/f:DI 79 [ __a ]) [0  S16 A8])
>                          ] UNSPEC_LD2)
>                      (vec_duplicate:V2SI (const_int 0 [0])))
>                  (vec_concat:V4SI (unspec:V2SI [
>                              (mem:TI (reg/v/f:DI 79 [ __a ]) [0  S16 A8])
>                          ] UNSPEC_LD2)
>                      (vec_duplicate:V2SI (const_int 0 [0])))) 0))
> simd.c:8 2149 {aarch64_ld2v2si_dreg}
>       (expr_list:REG_DEAD (reg/v/f:DI 79 [ __a ])
>          (nil)))
> (insn 20 6 21 2 (set (reg:V2SI 32 v0)
>          (subreg:V2SI (reg/v:OI 73 [ __o ]) 0)) simd.c:12 778
> {*aarch64_simd_movv2si}
>       (nil))
> (insn 21 20 22 2 (set (reg:V2SI 33 v1)
>          (subreg:V2SI (reg/v:OI 73 [ __o ]) 16)) simd.c:12 778
> {*aarch64_simd_movv2si}
>       (expr_list:REG_DEAD (reg/v:OI 73 [ __o ])
>          (nil)))
> (insn 22 21 23 2 (use (reg:V2SI 32 v0)) simd.c:12 -1
>       (nil))
> (insn 23 22 0 2 (use (reg:V2SI 33 v1)) simd.c:12 -1
>       (nil))
>
> The register allocator considers r73 to conflict with v0, because they
> are simultaneously live after insn 20. Without the 2nd use of v73 (eg
> if the write to res.val[1] is replaced with vdup_n_s32(0) ) then the
> allocator does do the right thing with the subreg and allocates v73 to
> {v0,v1}.
>
> I haven't read all of the old threads relating to Richard's patches
> yet, but I don't see why they would affect this issue.
>
> I don't think the register allocator is able to resolve this unless
> the conversion between the __builtin_simd type and the int32x4x2_t
> type is done as a single operation.
>

For this piece of code,

#include "arm_neon.h"

int32x2x2_t xvld2_s32(int32_t *__a)
{
   union { int32x2x2_t __i;
          __builtin_aarch64_simd_oi __o; } __temp;
   __temp.__o = __builtin_aarch64_ld2v2si ((const 
__builtin_aarch64_simd_si *) __a);
   return __temp.__i;
}

int32x2x2_t yvld2_s32(int32_t *__a)
{
   int32x2x2_t ret;
   __builtin_aarch64_simd_oi __o;
   __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) 
__a);
   ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
   ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
   return ret;
}

currently my gcc HEAD generates at -O3:

xvld2_s32:
	ld2	{v0.2s - v1.2s}, [x0]
	sub	sp, sp, #64
	st1	{v0.16b - v1.16b}, [sp]
	ldr	x1, [sp]
	ldr	x0, [sp, 8]
	add	sp, sp, 64
	ins	v0.d[0], x1
	ins	v1.d[0], x0
	ret
         ....
yvld2_s32:
	ld2	{v2.2s - v3.2s}, [x0]
	orr	v1.8b, v3.8b, v3.8b
	orr	v0.8b, v2.8b, v2.8b
	ret

If we use type-punning, there are unnecessary spills that are generated 
which is also incorrect for BE because of of the way we spill (st1 
{v0.16b - v1.16b}, [sp]) and restore. The implementation without 
type-punning seems to give a more optimal result. Did your patches 
improve on the spills for the type-punning solution?

> However, type-punning is not possible with the arrays of 64 bit
> vectors, as the arrays are not the same size as the corresponding
> __builtin_simd types, and any solution for those would probably help
> with the q variants too.

That is because we fill a zero-extended D-reg value into a 128-bit reg 
and pack them into an large int mode(eg. OI). We don't have large int 
modes made up of purely D-regs because we run into ambiguities like 4 
D-regs is an OImode and 2 Q-regs is also an OImode.

> Maybe the solution is to pass the NEON
> intrinsic types directly to the builtins? Is there a reason that it
> wasn't done that way before?
>

How do you mean? Do you mean pass a loaded value int32x2x2_t into a 
__builtin? How will that work?

If you mean why we don't pass an int32x2x2_t into a builtin as a 
structure, I don't think that would work as it is struct type which 
would correspond to a  BLK mode, but we need RTL patterns with reg-lists 
to work with large int modes for the regalloc to allocate consecutive 
regs for the reglists.

Thanks,
Tejas.

next prev parent reply	other threads:[~2014-09-26 12:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-18 19:40 [PATCH 0/4] [AARCH64,NEON] Improve various NEON load/store intrinsics Charles Baylis
2014-09-18 19:40 ` [PATCH 2/4] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_* Charles Baylis
2014-09-19 11:21   ` Tejas Belagod
2014-09-26  1:16     ` Charles Baylis
2014-09-26 12:47       ` Tejas Belagod [this message]
2014-10-08 18:47         ` Charles Baylis
2014-09-18 19:41 ` [PATCH 3/4] [AARCH64,NEON] Fix unnecessary moves in vld[234]q_* intrinsics Charles Baylis
2014-09-18 19:41 ` [PATCH 1/4] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics Charles Baylis
2014-09-19  8:40   ` Kyrill Tkachov
2014-09-19 10:46   ` Tejas Belagod
2014-09-24 16:36     ` Charles Baylis
2014-09-18 19:41 ` [PATCH 4/4] [AARCH64,NEON] Fix unnecessary moves in vst[234]q_* intrinsics Charles Baylis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54256053.6050405@arm.com \
    --to=tejas.belagod@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=charles.baylis@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).