Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Dennis Zhang <dennis.zhang@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	richard.sandiford@arm.com
Subject: Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
Date: Tue, 3 Nov 2020 17:00:33 +0000	[thread overview]
Message-ID: <0e864f47-4e3b-46cf-8ca8-4b1e42b4f408@arm.com> (raw)
In-Reply-To: <mptlffio7rf.fsf@arm.com>

On 11/3/20 2:05 PM, Richard Sandiford wrote:
> Dennis Zhang <dennis.zhang@arm.com> writes:
>> Hi Richard,
>>
>> On 10/30/20 2:07 PM, Richard Sandiford wrote:
>>> Dennis Zhang <Dennis.Zhang@arm.com> writes:
>>>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> index 332a0b6b1ea..39ebb776d1d 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> @@ -719,6 +719,9 @@
>>>>      VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
>>>>      VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
>>>>    
>>>> +  /* Implemented by aarch64_vget_halfv8bf.  */
>>>> +  VAR1 (GETREG, vget_half, 0, ALL, v8bf)
>>>
>>> This should be AUTO_FP, since it doesn't have any side-effects.
>>> (As before, we should probably rename the flag, but that's separate work.)
>>>
>>>> +
>>>>      /* Implemented by aarch64_simd_<sur>mmlav16qi.  */
>>>>      VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
>>>>      VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>>> index 9f0e2bd1e6f..f62c52ca327 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -7159,6 +7159,19 @@
>>>>      [(set_attr "type" "neon_dot<VDQSF:q>")]
>>>>    )
>>>>    
>>>> +;; vget_low/high_bf16
>>>> +(define_expand "aarch64_vget_halfv8bf"
>>>> +  [(match_operand:V4BF 0 "register_operand")
>>>> +   (match_operand:V8BF 1 "register_operand")
>>>> +   (match_operand:SI 2 "aarch64_zero_or_1")]
>>>> +  "TARGET_BF16_SIMD"
>>>> +{
>>>> +  int hbase = INTVAL (operands[2]);
>>>> +  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
>>>
>>> I think this needs to be:
>>>
>>>     aarch64_simd_vect_par_cnst_half
>>>
>>> instead.  The issue is that on big-endian targets, GCC assumes vector
>>> lane 0 is in the high part of the register, whereas for AArch64 it's
>>> always in the low part of the register.  So we convert from AArch64
>>> numbering to GCC numbering when generating the rtx and then take
>>> endianness into account when matching the rtx later.
>>>
>>> It would be good to have -mbig-endian tests that make sure we generate
>>> the right instruction for each function (i.e. we get them the right way
>>> round).  I guess it would be good to test that for little-endian too.
>>>
>>
>> I've updated the expander using aarch64_simd_vect_par_cnst_half.
>> And the expander is divided into two for getting low and high half
>> seperately.
>> It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
>> targets with new tests including -mbig-endian option.
>>
>>>> +  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
>>>> +  DONE;
>>>> +})
>>>> +
>>>>    ;; bfmmla
>>>>    (define_insn "aarch64_bfmmlaqv4sf"
>>>>      [(set (match_operand:V4SF 0 "register_operand" "=w")
>>>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>>>> index 215fcec5955..0c8bc2b0c73 100644
>>>> --- a/gcc/config/aarch64/predicates.md
>>>> +++ b/gcc/config/aarch64/predicates.md
>>>> @@ -84,6 +84,10 @@
>>>>    		 (ior (match_test "op == constm1_rtx")
>>>>    		      (match_test "op == const1_rtx"))))))
>>>>    
>>>> +(define_predicate "aarch64_zero_or_1"
>>>> +  (and (match_code "const_int")
>>>> +       (match_test "op == const0_rtx || op == const1_rtx")))
>>>
>>> zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
>>> But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
>>> so let's keep it as-is.
>>>
>>
>> This predicate is removed since there is no need of the imm operand in
>> the new expanders.
>>
>> Thanks for the reviews.
>> Is it OK for trunk now?
> 
> Looks good.  OK for trunk and branches, thanks.
> 
> Richard
> 

Thanks for approval, Richard!
This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102

Bests
Dennis

next prev parent reply	other threads:[~2020-11-03 17:01 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26 17:25 [PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16 Dennis Zhang
2019-11-29 13:02 ` Richard Sandiford
2019-12-05 15:31   ` Dennis Zhang
2019-12-06 10:22     ` Richard Sandiford
2019-12-12 17:01       ` Dennis Zhang
2019-12-13 10:23         ` Richard Sandiford
2020-10-29 12:19         ` [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32 Dennis Zhang
2020-10-29 12:28           ` [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector Dennis Zhang
2020-10-30 14:07             ` Richard Sandiford
2020-11-03 11:16               ` Dennis Zhang
2020-11-03 14:05                 ` Richard Sandiford
2020-11-03 17:00                   ` Dennis Zhang [this message]
2020-11-05 20:07                 ` Christophe Lyon
2020-10-29 17:48           ` [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32 Richard Sandiford
2020-11-02 17:27             ` Dennis Zhang
2020-11-02 19:05               ` Richard Sandiford
2020-11-03 13:06                 ` Dennis Zhang
2020-12-10 14:26                   ` [backport gcc-10][AArch64] ACLE bf16 convert Dennis Zhang
2020-12-10 14:34                     ` [backport gcc-10][AArch64] ACLE bf16 get Dennis Zhang
2020-12-11 11:58                       ` Kyrylo Tkachov
2020-12-11 16:31                         ` Dennis Zhang
2020-12-11 11:23                     ` [backport gcc-10][AArch64] ACLE bf16 convert Kyrylo Tkachov
2020-12-11 16:35                       ` Dennis Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e864f47-4e3b-46cf-8ca8-4b1e42b4f408@arm.com \
    --to=dennis.zhang@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).