From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 477C7385C332
	for <binutils@sourceware.org>; Fri, 30 Sep 2022 14:54:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 477C7385C332
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B754723A;
	Fri, 30 Sep 2022 07:54:21 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 00D6C3F792;
	Fri, 30 Sep 2022 07:54:13 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "dongbo \(E\)" <dongbo4@huawei.com>
Mail-Followup-To: "dongbo \(E\)" <dongbo4@huawei.com>,Shaokun Zhang via Binutils <binutils@sourceware.org>,  Shaokun Zhang <zhangshaokun@hisilicon.com>,  Jingtao Cai <caijingtao@huawei.com>,  "Richard Earnshaw" <rearnsha@arm.com>,  Marcus Shawcroft <marcus.shawcroft@arm.com>,  Jan Beulich <jbeulich@suse.com>, richard.sandiford@arm.com
Cc: Shaokun Zhang via Binutils <binutils@sourceware.org>,  Shaokun Zhang <zhangshaokun@hisilicon.com>,  Jingtao Cai <caijingtao@huawei.com>,  "Richard Earnshaw" <rearnsha@arm.com>,  Marcus Shawcroft <marcus.shawcroft@arm.com>,  Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH RESEND v2] Aarch64: Allow explicit size specifier for predicate operand of {sq, uq, }{incp, decp}
References: <20220216005311.26184-1-zhangshaokun@hisilicon.com>
	<mptpmg180rr.fsf@arm.com>
	<f05ac6b0-fd13-fd37-e228-c49e55526d44@huawei.com>
Date: Fri, 30 Sep 2022 15:54:12 +0100
In-Reply-To: <f05ac6b0-fd13-fd37-e228-c49e55526d44@huawei.com> (dongbo's
	message of "Wed, 28 Sep 2022 10:19:50 +0800")
Message-ID: <mptbkqwoquj.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Spam-Status: No, score=-43.7 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

"dongbo (E)" <dongbo4@huawei.com> writes:
> Hi, thank you all for the replies.
>
> On 2022/9/12 16:17, Richard Sandiford wrote:
>> Hi,
>>
>> Thanks for the patch and sorry for the slow reply.  For the record,
>> I'm also not an aarch64 maintainer, but Richard E and I were talking
>> about this last week.
>>
>> Shaokun Zhang via Binutils <binutils@sourceware.org> writes:
>>> From: Jingtao Cai <caijingtao@huawei.com>
>>>
>>> Omitting predicate size specifier in vector form of {sq, uq, }{decp, in=
cp} is deprecated and will be prohibited in a future release of the aarch64,
>>> see https://developer.arm.com/documentation/ddi0602/2021-09/SVE-Instruc=
tions/DECP--vector---Decrement-vector-by-count-of-true-predicate-elements-.
>>>
>>> This allows explicit size specifier, e.g. `decp z0.h, p0.h`, for predic=
ate operand of these SVE instructions.
>>> The exsiting behaviour of not requiring the specifier is preserved, but=
 will be warned.
>>> And the disasembly is with the specifier with this patch.
>> The patch is definitely doing the right thing by the architecture
>> definition.  However, given that the old syntax has been in use for
>> a while, it might be quite disruptive to deprecate the old syntax
>> now in the GNU tools.  When Richard and I discussed the various ways
>> of coping with the deprecation in GCC, none of the options seemed
>> particularly appealing.
>>
>> It's extremely unlikely that the old syntax would be repurposed to mean
>> something else.  Therefore, Richard and I thought that it might be better
>> to continue to accept the old syntax alongside the new syntax in the GNU
>> tools, as a GNU extension to the official assembler spec, without the
>> deprecation message.  Sorry to raise this after so long, and after you'd
>> already done the work.
>
> OK, I think we can delete the warning to make this done.
>
>>> diff --git a/opcodes/aarch64-tbl.h b/opcodes/aarch64-tbl.h
>>> index cb039d63eba3..794e2d24f528 100644
>>> --- a/opcodes/aarch64-tbl.h
>>> +++ b/opcodes/aarch64-tbl.h
>>> @@ -4200,6 +4200,7 @@ const struct aarch64_opcode aarch64_opcode_table[=
] =3D
>>>     _SVE_INSN ("decd", 0x04f0e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SV=
E_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0),
>>>     _SVE_INSNC ("dech", 0x0470c400, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Z=
d, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPR=
FX, 0),
>>>     _SVE_INSN ("dech", 0x0470e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SV=
E_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0),
>>> +  _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SV=
E_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0),
>>>     _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (S=
VE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0),
>> I haven't tested whether this works, but rather than have two entries per
>> instruction, I think it'd be better to add a new OP_SVE_ macro that lists
>> both the qualified and unqualified forms.  The current naming scheme is:
>>
>> /* The naming convention for SVE macros is:
>>
>>     OP_SVE_<operands>[_<sizes>]*
>>
>>     <operands> contains one character per operand, using the following s=
cheme:
>>
>>     - U: the operand is unqualified (NIL).
>>
>>     - [BHSD]: the operand has a S_[BHSD] qualifier and the choice of
>>       qualifier is the same for all variants.  This is used for both
>>       .[BHSD] suffixes on an SVE predicate or vector register and
>>       scalar FPRs of the form [BHSD]<number>.
>>
>>     - [WX]: the operand has a [WX] qualifier and the choice of qualifier
>>       is the same for all variants.
>>
>>     - [ZM]: the operand has a /[ZM] suffix and the choice of suffix
>>       is the same for all variants.
>>
>>     - V: the operand has a S_[BHSD] qualifier and the choice of qualifier
>>       is not the same for all variants.
>>
>>     - R: the operand has a [WX] qualifier and the choice of qualifier is
>>       not the same for all variants.
>>
>>     - P: the operand has a /[ZM] suffix and the choice of suffix is not
>>       the same for all variants.
>>
>>     The _<sizes>, if present, give the subset of [BHSD] that are accepted
>>     by the V entries in <operands>.  */
>>
>> so maybe we could use a lower case "v" to represent "like V, but the
>> qualifier is optional".  That is:
>>
>> #define OP_SVE_Vv_HSD                                   \
>> {                                                       \
>>    QLF2(S_H,NIL),                                        \
>>    QLF2(S_S,NIL),                                        \
>>    QLF2(S_D,NIL),                                        \
>>    QLF2(S_H,S_H),                                        \
>>    QLF2(S_S,S_S),                                        \
>>    QLF2(S_D,S_D),                                        \
>> }
>>
>> (Not sure how we ended up with two definitions of OP_SVE_VU_HSD
>> in aarch64-tbl.h -- oops.)
>
> We tested the `OP_SVE_Vv_HSD` above.
>
> For `decp z0.h, p0.h`, we also need to modify `sve_size_hsd` encoding:
>
> ```
>
> opcodes/aarch64-asm.c:
>  =C2=A0=C2=A0=C2=A0 aarch64_encode_variant_using_iclass (struct aarch64_i=
nst *inst) {
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 case sve_size_hsd:
>  =C2=A0-=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 insert_f=
ield (FLD_size, &inst->value, aarch64_get_variant=20
> (inst) + 1, 0);
> +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 insert_field (F=
LD_size, &inst->value, aarch64_get_variant=20
> (inst) % 3 + 1, 0);
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break;
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
>  =C2=A0=C2=A0=C2=A0 }
>
> ```
>
> Both `decp z0.h, p0` and `decp z0.h, p0.h` can be encoded correctly.

Ah, yeah, that looks right.

> However, what bothers us is `decp` instructions are disassembled to=20
> unqualified versions, i.e. `decp z0.h, p0`,
>
> because `aarch64_decode_variant_using_iclass` chooses unqualified variant:
>
> ```
>
>  =C2=A0=C2=A0=C2=A0 aarch64_decode_variant_using_iclass (aarch64_inst *in=
st) {
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ....
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 case sve_size_hsd:
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 i =3D extract_f=
ield (FLD_size, inst->value, 0);
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (i < 1)
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=
=A0 return false;
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 variant =3D i -=
 1;
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break;
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
>  =C2=A0=C2=A0=C2=A0 }
>
> ```
>
> It's better to get `decp z0.h, p0.h`, right?
>
>
> We tried to put `S_H` in front of `NIL`:
>
> ```
>
>  =C2=A0=C2=A0=C2=A0 #define OP_SVE_Vv_HSD=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
> {=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_H,S_H),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_S,S_S),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_D,S_D),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_H,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_S,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_D,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \
>  =C2=A0=C2=A0=C2=A0 }
>
> ```

Yeah, good point.  It should be in this order, like you say.

The fixes you mention look correct to me.

> But assembler will fail in `match_operands_qualifier` :(.
>
> ```
>
>  =C2=A0=C2=A0=C2=A0 match_operands_qualifier (aarch64_inst *inst, bool up=
date_p)
>  =C2=A0=C2=A0=C2=A0 {
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (!aarch64_find_best_match (...))
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ...
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (inst->opcode->flags & F_STRICT)
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 {
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 /* Require an e=
xact qualifier match, even for NIL=20
> qualifiers.=C2=A0 */
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 nops =3D aarch6=
4_num_of_operands (inst->opcode);
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 for (i =3D 0; i=
 < nops; ++i)
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=
=A0 if (inst->operands[i].qualifier !=3D qualifiers[i])
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=
=A0 =C2=A0=C2=A0=C2=A0 return false;
>  =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }
>  =C2=A0=C2=A0=C2=A0 }
>
> ```

I think that's a misfeature of the F_STRICT handling.  Does it work
with the patch below?

Thanks,
Richard


--=-=-=
Content-Type: text/x-diff
Content-Disposition: inline;
 filename=0001-data-ref-Fix-ranges_maybe_overlap_p-test.patch

From: Richard Sandiford <richard.sandiford@arm.com>
Subject: [PATCH] data-ref: Fix ranges_maybe_overlap_p test
To: gcc-patches@gcc.gnu.org
Gcc: private.sent
--text follows this line--
dr_may_alias_p rightly used poly_int_tree_p to guard a use of
ranges_maybe_overlap_p, but used the non-poly extractors.
This caused a few failures in the SVE ACLE asm tests.

gcc/
	* tree-data-ref.cc (dr_may_alias_p): Use to_poly_widest instead
	of to_widest.
---
 gcc/tree-data-ref.cc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index 91bfb619d66..978c3f002f7 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -2979,10 +2979,10 @@ dr_may_alias_p (const struct data_reference *a, const struct data_reference *b,
 	  && operand_equal_p (DR_OFFSET (a), DR_OFFSET (b))
 	  && poly_int_tree_p (tree_size_a)
 	  && poly_int_tree_p (tree_size_b)
-	  && !ranges_maybe_overlap_p (wi::to_widest (DR_INIT (a)),
-				      wi::to_widest (tree_size_a),
-				      wi::to_widest (DR_INIT (b)),
-				      wi::to_widest (tree_size_b)))
+	  && !ranges_maybe_overlap_p (wi::to_poly_widest (DR_INIT (a)),
+				      wi::to_poly_widest (tree_size_a),
+				      wi::to_poly_widest (DR_INIT (b)),
+				      wi::to_poly_widest (tree_size_b)))
 	{
 	  gcc_assert (integer_zerop (DR_STEP (a))
 		      && integer_zerop (DR_STEP (b)));
-- 
2.25.1


--=-=-=--