From: Dennis Zhang <dennis.zhang@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>,
Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
richard.sandiford@arm.com
Subject: Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
Date: Tue, 3 Nov 2020 17:00:33 +0000 [thread overview]
Message-ID: <0e864f47-4e3b-46cf-8ca8-4b1e42b4f408@arm.com> (raw)
In-Reply-To: <mptlffio7rf.fsf@arm.com>
On 11/3/20 2:05 PM, Richard Sandiford wrote:
> Dennis Zhang <dennis.zhang@arm.com> writes:
>> Hi Richard,
>>
>> On 10/30/20 2:07 PM, Richard Sandiford wrote:
>>> Dennis Zhang <Dennis.Zhang@arm.com> writes:
>>>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> index 332a0b6b1ea..39ebb776d1d 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>>>> @@ -719,6 +719,9 @@
>>>> VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
>>>> VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
>>>>
>>>> + /* Implemented by aarch64_vget_halfv8bf. */
>>>> + VAR1 (GETREG, vget_half, 0, ALL, v8bf)
>>>
>>> This should be AUTO_FP, since it doesn't have any side-effects.
>>> (As before, we should probably rename the flag, but that's separate work.)
>>>
>>>> +
>>>> /* Implemented by aarch64_simd_<sur>mmlav16qi. */
>>>> VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
>>>> VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>>> index 9f0e2bd1e6f..f62c52ca327 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -7159,6 +7159,19 @@
>>>> [(set_attr "type" "neon_dot<VDQSF:q>")]
>>>> )
>>>>
>>>> +;; vget_low/high_bf16
>>>> +(define_expand "aarch64_vget_halfv8bf"
>>>> + [(match_operand:V4BF 0 "register_operand")
>>>> + (match_operand:V8BF 1 "register_operand")
>>>> + (match_operand:SI 2 "aarch64_zero_or_1")]
>>>> + "TARGET_BF16_SIMD"
>>>> +{
>>>> + int hbase = INTVAL (operands[2]);
>>>> + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
>>>
>>> I think this needs to be:
>>>
>>> aarch64_simd_vect_par_cnst_half
>>>
>>> instead. The issue is that on big-endian targets, GCC assumes vector
>>> lane 0 is in the high part of the register, whereas for AArch64 it's
>>> always in the low part of the register. So we convert from AArch64
>>> numbering to GCC numbering when generating the rtx and then take
>>> endianness into account when matching the rtx later.
>>>
>>> It would be good to have -mbig-endian tests that make sure we generate
>>> the right instruction for each function (i.e. we get them the right way
>>> round). I guess it would be good to test that for little-endian too.
>>>
>>
>> I've updated the expander using aarch64_simd_vect_par_cnst_half.
>> And the expander is divided into two for getting low and high half
>> seperately.
>> It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
>> targets with new tests including -mbig-endian option.
>>
>>>> + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
>>>> + DONE;
>>>> +})
>>>> +
>>>> ;; bfmmla
>>>> (define_insn "aarch64_bfmmlaqv4sf"
>>>> [(set (match_operand:V4SF 0 "register_operand" "=w")
>>>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>>>> index 215fcec5955..0c8bc2b0c73 100644
>>>> --- a/gcc/config/aarch64/predicates.md
>>>> +++ b/gcc/config/aarch64/predicates.md
>>>> @@ -84,6 +84,10 @@
>>>> (ior (match_test "op == constm1_rtx")
>>>> (match_test "op == const1_rtx"))))))
>>>>
>>>> +(define_predicate "aarch64_zero_or_1"
>>>> + (and (match_code "const_int")
>>>> + (match_test "op == const0_rtx || op == const1_rtx")))
>>>
>>> zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
>>> But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
>>> so let's keep it as-is.
>>>
>>
>> This predicate is removed since there is no need of the imm operand in
>> the new expanders.
>>
>> Thanks for the reviews.
>> Is it OK for trunk now?
>
> Looks good. OK for trunk and branches, thanks.
>
> Richard
>
Thanks for approval, Richard!
This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102
Bests
Dennis
next prev parent reply other threads:[~2020-11-03 17:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-26 17:25 [PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16 Dennis Zhang
2019-11-29 13:02 ` Richard Sandiford
2019-12-05 15:31 ` Dennis Zhang
2019-12-06 10:22 ` Richard Sandiford
2019-12-12 17:01 ` Dennis Zhang
2019-12-13 10:23 ` Richard Sandiford
2020-10-29 12:19 ` [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32 Dennis Zhang
2020-10-29 12:28 ` [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector Dennis Zhang
2020-10-30 14:07 ` Richard Sandiford
2020-11-03 11:16 ` Dennis Zhang
2020-11-03 14:05 ` Richard Sandiford
2020-11-03 17:00 ` Dennis Zhang [this message]
2020-11-05 20:07 ` Christophe Lyon
2020-10-29 17:48 ` [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32 Richard Sandiford
2020-11-02 17:27 ` Dennis Zhang
2020-11-02 19:05 ` Richard Sandiford
2020-11-03 13:06 ` Dennis Zhang
2020-12-10 14:26 ` [backport gcc-10][AArch64] ACLE bf16 convert Dennis Zhang
2020-12-10 14:34 ` [backport gcc-10][AArch64] ACLE bf16 get Dennis Zhang
2020-12-11 11:58 ` Kyrylo Tkachov
2020-12-11 16:31 ` Dennis Zhang
2020-12-11 11:23 ` [backport gcc-10][AArch64] ACLE bf16 convert Kyrylo Tkachov
2020-12-11 16:35 ` Dennis Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0e864f47-4e3b-46cf-8ca8-4b1e42b4f408@arm.com \
--to=dennis.zhang@arm.com \
--cc=Kyrylo.Tkachov@arm.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).