Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: James Greenhalgh <james.greenhalgh@arm.com>
To: Jiangjiji <jiangjiji@huawei.com>
Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	       "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	       Richard Earnshaw <Richard.Earnshaw@arm.com>,
	       "Yangfei (Felix)" <felix.yang@huawei.com>,
	       Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
Date: Tue, 16 Jun 2015 09:06:00 -0000	[thread overview]
Message-ID: <20150616090529.GA31315@arm.com> (raw)
In-Reply-To: <B34C25384B9D7A428FF276D3AE7D6BDC7B510C98@nkgeml511-mbx.china.huawei.com>

On Tue, May 05, 2015 at 02:14:16PM +0100, Jiangjiji wrote:
> Hi Jamesï¼Œ
> 
> Thanks for your comment.
> 
> Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax way.
> 
> Example:
>         dup     v1.2s, v1.s[0]
>         fmul    v0.2s, v1.2s, v0.2s
> 
> And we need another pattern to combine this two insns into 'fmul
> %0.2s,%1.2s,%2.s[0]', which is kind of complex.

This would be the best way to handle the intrinsic - then we can
potentially make a nice optimization of lots of additional code. I
think it would be best if we add that extra pattern to combine the two
instructions, then write the intrinsic as __a * __b.

Looking at the pattern you propose to add for vmul_n_f32, I'm not sure
why this does not currently work, maybe it is just as simple as swapping
the operand order in:

+(define_insn "aarch64_mul_n<mode>"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+        (mult:VMUL
+          (match_operand:VMUL 1 "register_operand" "w")
+          (vec_duplicate:VMUL
+            (match_operand:<VEL> 2 "register_operand" "<h_con>"))))]
+  "TARGET_SIMD"
+  "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+  [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+

To look like:

+(define_insn "aarch64_mul_n<mode>"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+        (mult:VMUL
+          (vec_duplicate:VMUL
+            (match_operand:<VEL> 2 "register_operand" "<h_con>"))
+          (match_operand:VMUL 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+  [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+

Which I think better matches the canonicalization rules for combine?


> BTW: maybe it's better to reconsider this issue after this patch, right?

Maybe, but I'd rather see this done now rather than later. Otherwise,
we have to add and remove the 

If you want to split your patch in to two parts, one part for the
vmul_n_* intrinsics, and one part for the other intrinsics, then I'd be
happy to review them as two separate patches.

Thanks,
James

> On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> > Hi,
> >   This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
> >   Regtested with aarch64-linux-gnu on QEMU.
> >   This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> >   OK for the trunk?
> > 
> > Thanks.
> > Jiang jiji
> > 
> > 
> > ----------
> > Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
> > 
> > Hi, Kyrill
> >   Thank you for your suggestion.
> >   I fixed it and regtested with aarch64-linux-gnu on QEMU.
> >   This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> >   OK for the trunk?
> 
> Hi Jiang,
> 
> I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment.
> 
> > +__extension__ static __inline float32x2_t __attribute__ 
> > +((__always_inline__))
> > +vmul_n_f32 (float32x2_t __a, float32_t __b) {
> > +  return __builtin_aarch64_mul_nv2sf (__a, __b); }
> > +
> 
> For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as:
> 
>   __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
>   vmul_n_f32 (float32x2_t __a, float32_t __b)
>   {
>     return __a * __b;
>   }
> 
> It would be great if we could make that work.
> 
> Thanks,
> James
>

next prev parent reply	other threads:[~2015-06-16  9:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-12 11:16 [PING^2] " Jiangjiji
2015-03-12 11:44 ` Kyrill Tkachov
2015-03-14 10:09   ` Jiangjiji
2015-04-11 10:38   ` [PING^3] " Jiangjiji
2015-05-05  6:37     ` James Greenhalgh
2015-05-05 13:14       ` Jiangjiji
2015-06-16  9:06         ` James Greenhalgh [this message]
2015-05-04  8:52   ` [PING^4] " Jiangjiji

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150616090529.GA31315@arm.com \
    --to=james.greenhalgh@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=felix.yang@huawei.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jiangjiji@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).