From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 477C7385C332 for ; Fri, 30 Sep 2022 14:54:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 477C7385C332 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B754723A; Fri, 30 Sep 2022 07:54:21 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 00D6C3F792; Fri, 30 Sep 2022 07:54:13 -0700 (PDT) From: Richard Sandiford To: "dongbo \(E\)" Mail-Followup-To: "dongbo \(E\)" ,Shaokun Zhang via Binutils , Shaokun Zhang , Jingtao Cai , "Richard Earnshaw" , Marcus Shawcroft , Jan Beulich , richard.sandiford@arm.com Cc: Shaokun Zhang via Binutils , Shaokun Zhang , Jingtao Cai , "Richard Earnshaw" , Marcus Shawcroft , Jan Beulich Subject: Re: [PATCH RESEND v2] Aarch64: Allow explicit size specifier for predicate operand of {sq, uq, }{incp, decp} References: <20220216005311.26184-1-zhangshaokun@hisilicon.com> Date: Fri, 30 Sep 2022 15:54:12 +0100 In-Reply-To: (dongbo's message of "Wed, 28 Sep 2022 10:19:50 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-43.7 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable "dongbo (E)" writes: > Hi, thank you all for the replies. > > On 2022/9/12 16:17, Richard Sandiford wrote: >> Hi, >> >> Thanks for the patch and sorry for the slow reply. For the record, >> I'm also not an aarch64 maintainer, but Richard E and I were talking >> about this last week. >> >> Shaokun Zhang via Binutils writes: >>> From: Jingtao Cai >>> >>> Omitting predicate size specifier in vector form of {sq, uq, }{decp, in= cp} is deprecated and will be prohibited in a future release of the aarch64, >>> see https://developer.arm.com/documentation/ddi0602/2021-09/SVE-Instruc= tions/DECP--vector---Decrement-vector-by-count-of-true-predicate-elements-. >>> >>> This allows explicit size specifier, e.g. `decp z0.h, p0.h`, for predic= ate operand of these SVE instructions. >>> The exsiting behaviour of not requiring the specifier is preserved, but= will be warned. >>> And the disasembly is with the specifier with this patch. >> The patch is definitely doing the right thing by the architecture >> definition. However, given that the old syntax has been in use for >> a while, it might be quite disruptive to deprecate the old syntax >> now in the GNU tools. When Richard and I discussed the various ways >> of coping with the deprecation in GCC, none of the options seemed >> particularly appealing. >> >> It's extremely unlikely that the old syntax would be repurposed to mean >> something else. Therefore, Richard and I thought that it might be better >> to continue to accept the old syntax alongside the new syntax in the GNU >> tools, as a GNU extension to the official assembler spec, without the >> deprecation message. Sorry to raise this after so long, and after you'd >> already done the work. > > OK, I think we can delete the warning to make this done. > >>> diff --git a/opcodes/aarch64-tbl.h b/opcodes/aarch64-tbl.h >>> index cb039d63eba3..794e2d24f528 100644 >>> --- a/opcodes/aarch64-tbl.h >>> +++ b/opcodes/aarch64-tbl.h >>> @@ -4200,6 +4200,7 @@ const struct aarch64_opcode aarch64_opcode_table[= ] =3D >>> _SVE_INSN ("decd", 0x04f0e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SV= E_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >>> _SVE_INSNC ("dech", 0x0470c400, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Z= d, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPR= FX, 0), >>> _SVE_INSN ("dech", 0x0470e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SV= E_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >>> + _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SV= E_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >>> _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (S= VE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> I haven't tested whether this works, but rather than have two entries per >> instruction, I think it'd be better to add a new OP_SVE_ macro that lists >> both the qualified and unqualified forms. The current naming scheme is: >> >> /* The naming convention for SVE macros is: >> >> OP_SVE_[_]* >> >> contains one character per operand, using the following s= cheme: >> >> - U: the operand is unqualified (NIL). >> >> - [BHSD]: the operand has a S_[BHSD] qualifier and the choice of >> qualifier is the same for all variants. This is used for both >> .[BHSD] suffixes on an SVE predicate or vector register and >> scalar FPRs of the form [BHSD]. >> >> - [WX]: the operand has a [WX] qualifier and the choice of qualifier >> is the same for all variants. >> >> - [ZM]: the operand has a /[ZM] suffix and the choice of suffix >> is the same for all variants. >> >> - V: the operand has a S_[BHSD] qualifier and the choice of qualifier >> is not the same for all variants. >> >> - R: the operand has a [WX] qualifier and the choice of qualifier is >> not the same for all variants. >> >> - P: the operand has a /[ZM] suffix and the choice of suffix is not >> the same for all variants. >> >> The _, if present, give the subset of [BHSD] that are accepted >> by the V entries in . */ >> >> so maybe we could use a lower case "v" to represent "like V, but the >> qualifier is optional". That is: >> >> #define OP_SVE_Vv_HSD \ >> { \ >> QLF2(S_H,NIL), \ >> QLF2(S_S,NIL), \ >> QLF2(S_D,NIL), \ >> QLF2(S_H,S_H), \ >> QLF2(S_S,S_S), \ >> QLF2(S_D,S_D), \ >> } >> >> (Not sure how we ended up with two definitions of OP_SVE_VU_HSD >> in aarch64-tbl.h -- oops.) > > We tested the `OP_SVE_Vv_HSD` above. > > For `decp z0.h, p0.h`, we also need to modify `sve_size_hsd` encoding: > > ``` > > opcodes/aarch64-asm.c: > =C2=A0=C2=A0=C2=A0 aarch64_encode_variant_using_iclass (struct aarch64_i= nst *inst) { > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ... > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 case sve_size_hsd: > =C2=A0-=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 insert_f= ield (FLD_size, &inst->value, aarch64_get_variant=20 > (inst) + 1, 0); > +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 insert_field (F= LD_size, &inst->value, aarch64_get_variant=20 > (inst) % 3 + 1, 0); > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break; > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ... > =C2=A0=C2=A0=C2=A0 } > > ``` > > Both `decp z0.h, p0` and `decp z0.h, p0.h` can be encoded correctly. Ah, yeah, that looks right. > However, what bothers us is `decp` instructions are disassembled to=20 > unqualified versions, i.e. `decp z0.h, p0`, > > because `aarch64_decode_variant_using_iclass` chooses unqualified variant: > > ``` > > =C2=A0=C2=A0=C2=A0 aarch64_decode_variant_using_iclass (aarch64_inst *in= st) { > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 .... > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 case sve_size_hsd: > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 i =3D extract_f= ield (FLD_size, inst->value, 0); > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (i < 1) > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 return false; > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 variant =3D i -= 1; > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break; > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ... > =C2=A0=C2=A0=C2=A0 } > > ``` > > It's better to get `decp z0.h, p0.h`, right? > > > We tried to put `S_H` in front of `NIL`: > > ``` > > =C2=A0=C2=A0=C2=A0 #define OP_SVE_Vv_HSD=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > {=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_H,S_H),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_S,S_S),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_D,S_D),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_H,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_S,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 =C2=A0 QLF2(S_D,NIL),=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 \ > =C2=A0=C2=A0=C2=A0 } > > ``` Yeah, good point. It should be in this order, like you say. The fixes you mention look correct to me. > But assembler will fail in `match_operands_qualifier` :(. > > ``` > > =C2=A0=C2=A0=C2=A0 match_operands_qualifier (aarch64_inst *inst, bool up= date_p) > =C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ... > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (!aarch64_find_best_match (...)) > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ... > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (inst->opcode->flags & F_STRICT) > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 /* Require an e= xact qualifier match, even for NIL=20 > qualifiers.=C2=A0 */ > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 nops =3D aarch6= 4_num_of_operands (inst->opcode); > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 for (i =3D 0; i= < nops; ++i) > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 if (inst->operands[i].qualifier !=3D qualifiers[i]) > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 return false; > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0 } > > ``` I think that's a misfeature of the F_STRICT handling. Does it work with the patch below? Thanks, Richard --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-data-ref-Fix-ranges_maybe_overlap_p-test.patch From: Richard Sandiford Subject: [PATCH] data-ref: Fix ranges_maybe_overlap_p test To: gcc-patches@gcc.gnu.org Gcc: private.sent --text follows this line-- dr_may_alias_p rightly used poly_int_tree_p to guard a use of ranges_maybe_overlap_p, but used the non-poly extractors. This caused a few failures in the SVE ACLE asm tests. gcc/ * tree-data-ref.cc (dr_may_alias_p): Use to_poly_widest instead of to_widest. --- gcc/tree-data-ref.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc index 91bfb619d66..978c3f002f7 100644 --- a/gcc/tree-data-ref.cc +++ b/gcc/tree-data-ref.cc @@ -2979,10 +2979,10 @@ dr_may_alias_p (const struct data_reference *a, const struct data_reference *b, && operand_equal_p (DR_OFFSET (a), DR_OFFSET (b)) && poly_int_tree_p (tree_size_a) && poly_int_tree_p (tree_size_b) - && !ranges_maybe_overlap_p (wi::to_widest (DR_INIT (a)), - wi::to_widest (tree_size_a), - wi::to_widest (DR_INIT (b)), - wi::to_widest (tree_size_b))) + && !ranges_maybe_overlap_p (wi::to_poly_widest (DR_INIT (a)), + wi::to_poly_widest (tree_size_a), + wi::to_poly_widest (DR_INIT (b)), + wi::to_poly_widest (tree_size_b))) { gcc_assert (integer_zerop (DR_STEP (a)) && integer_zerop (DR_STEP (b))); -- 2.25.1 --=-=-=--