From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by sourceware.org (Postfix) with ESMTPS id 1161E3858D28 for ; Wed, 28 Sep 2022 02:19:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1161E3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=huawei.com Received: from dggpemm500021.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Mcg7q2H32z1P6np; Wed, 28 Sep 2022 10:15:35 +0800 (CST) Received: from dggpemm500002.china.huawei.com (7.185.36.229) by dggpemm500021.china.huawei.com (7.185.36.109) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 28 Sep 2022 10:19:50 +0800 Received: from [10.67.76.249] (10.67.76.249) by dggpemm500002.china.huawei.com (7.185.36.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 28 Sep 2022 10:19:50 +0800 Subject: Re: [PATCH RESEND v2] Aarch64: Allow explicit size specifier for predicate operand of {sq, uq, }{incp, decp} To: Shaokun Zhang via Binutils , Shaokun Zhang , Jingtao Cai , "Richard Earnshaw" , Marcus Shawcroft , , Jan Beulich References: <20220216005311.26184-1-zhangshaokun@hisilicon.com> From: "dongbo (E)" Message-ID: Date: Wed, 28 Sep 2022 10:19:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.67.76.249] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500002.china.huawei.com (7.185.36.229) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-14.2 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_MANYTO,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, thank you all for the replies. On 2022/9/12 16:17, Richard Sandiford wrote: > Hi, > > Thanks for the patch and sorry for the slow reply. For the record, > I'm also not an aarch64 maintainer, but Richard E and I were talking > about this last week. > > Shaokun Zhang via Binutils writes: >> From: Jingtao Cai >> >> Omitting predicate size specifier in vector form of {sq, uq, }{decp, incp} is deprecated and will be prohibited in a future release of the aarch64, >> see https://developer.arm.com/documentation/ddi0602/2021-09/SVE-Instructions/DECP--vector---Decrement-vector-by-count-of-true-predicate-elements-. >> >> This allows explicit size specifier, e.g. `decp z0.h, p0.h`, for predicate operand of these SVE instructions. >> The exsiting behaviour of not requiring the specifier is preserved, but will be warned. >> And the disasembly is with the specifier with this patch. > The patch is definitely doing the right thing by the architecture > definition. However, given that the old syntax has been in use for > a while, it might be quite disruptive to deprecate the old syntax > now in the GNU tools. When Richard and I discussed the various ways > of coping with the deprecation in GCC, none of the options seemed > particularly appealing. > > It's extremely unlikely that the old syntax would be repurposed to mean > something else. Therefore, Richard and I thought that it might be better > to continue to accept the old syntax alongside the new syntax in the GNU > tools, as a GNU extension to the official assembler spec, without the > deprecation message. Sorry to raise this after so long, and after you'd > already done the work. OK, I think we can delete the warning to make this done. >> diff --git a/opcodes/aarch64-tbl.h b/opcodes/aarch64-tbl.h >> index cb039d63eba3..794e2d24f528 100644 >> --- a/opcodes/aarch64-tbl.h >> +++ b/opcodes/aarch64-tbl.h >> @@ -4200,6 +4200,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSN ("decd", 0x04f0e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSNC ("dech", 0x0470c400, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("dech", 0x0470e400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> + _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("decp", 0x252d8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), > I haven't tested whether this works, but rather than have two entries per > instruction, I think it'd be better to add a new OP_SVE_ macro that lists > both the qualified and unqualified forms. The current naming scheme is: > > /* The naming convention for SVE macros is: > > OP_SVE_[_]* > > contains one character per operand, using the following scheme: > > - U: the operand is unqualified (NIL). > > - [BHSD]: the operand has a S_[BHSD] qualifier and the choice of > qualifier is the same for all variants. This is used for both > .[BHSD] suffixes on an SVE predicate or vector register and > scalar FPRs of the form [BHSD]. > > - [WX]: the operand has a [WX] qualifier and the choice of qualifier > is the same for all variants. > > - [ZM]: the operand has a /[ZM] suffix and the choice of suffix > is the same for all variants. > > - V: the operand has a S_[BHSD] qualifier and the choice of qualifier > is not the same for all variants. > > - R: the operand has a [WX] qualifier and the choice of qualifier is > not the same for all variants. > > - P: the operand has a /[ZM] suffix and the choice of suffix is not > the same for all variants. > > The _, if present, give the subset of [BHSD] that are accepted > by the V entries in . */ > > so maybe we could use a lower case "v" to represent "like V, but the > qualifier is optional". That is: > > #define OP_SVE_Vv_HSD \ > { \ > QLF2(S_H,NIL), \ > QLF2(S_S,NIL), \ > QLF2(S_D,NIL), \ > QLF2(S_H,S_H), \ > QLF2(S_S,S_S), \ > QLF2(S_D,S_D), \ > } > > (Not sure how we ended up with two definitions of OP_SVE_VU_HSD > in aarch64-tbl.h -- oops.) We tested the `OP_SVE_Vv_HSD` above. For `decp z0.h, p0.h`, we also need to modify `sve_size_hsd` encoding: ``` opcodes/aarch64-asm.c:     aarch64_encode_variant_using_iclass (struct aarch64_inst *inst) {         ...         case sve_size_hsd:  -            insert_field (FLD_size, &inst->value, aarch64_get_variant (inst) + 1, 0); +            insert_field (FLD_size, &inst->value, aarch64_get_variant (inst) % 3 + 1, 0);             break;         ...     } ``` Both `decp z0.h, p0` and `decp z0.h, p0.h` can be encoded correctly. However, what bothers us is `decp` instructions are disassembled to unqualified versions, i.e. `decp z0.h, p0`, because `aarch64_decode_variant_using_iclass` chooses unqualified variant: ```     aarch64_decode_variant_using_iclass (aarch64_inst *inst) {         ....         case sve_size_hsd:             i = extract_field (FLD_size, inst->value, 0);             if (i < 1)                 return false;             variant = i - 1;             break;         ...     } ``` It's better to get `decp z0.h, p0.h`, right? We tried to put `S_H` in front of `NIL`: ```     #define OP_SVE_Vv_HSD                        \ {                                                               \       QLF2(S_H,S_H),                                      \       QLF2(S_S,S_S),                                       \       QLF2(S_D,S_D),                                      \       QLF2(S_H,NIL),                                       \       QLF2(S_S,NIL),                                       \       QLF2(S_D,NIL),                                       \     } ``` But assembler will fail in `match_operands_qualifier` :(. ```     match_operands_qualifier (aarch64_inst *inst, bool update_p)     {         ...         if (!aarch64_find_best_match (...))         ...         if (inst->opcode->flags & F_STRICT)         {             /* Require an exact qualifier match, even for NIL qualifiers.  */             nops = aarch64_num_of_operands (inst->opcode);             for (i = 0; i < nops; ++i)                 if (inst->operands[i].qualifier != qualifiers[i])                     return false;         }     } ``` We suggest to keep two entries for these instructions like we proposed in previously. Seems it is the cleanest way. How do you think? Best Regards, Dong Bo > Thanks, > Richard > >> _SVE_INSN ("decp", 0x252d8800, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0), >> _SVE_INSNC ("decw", 0x04b0c400, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_SU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> @@ -4325,6 +4326,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSN ("incd", 0x04f0e000, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSNC ("inch", 0x0470c000, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("inch", 0x0470e000, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> + _SVE_INSNC ("incp", 0x252c8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("incp", 0x252c8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("incp", 0x252c8800, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0), >> _SVE_INSNC ("incw", 0x04b0c000, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_SU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> @@ -4688,6 +4690,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSNC ("sqdech", 0x0460c800, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("sqdech", 0x0470f800, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSN ("sqdech", 0x0460f800, 0xfff0fc00, sve_misc, 0, OP3 (Rd, Rd, SVE_PATTERN_SCALED), OP_SVE_XWU, F_OPD2_OPT | F_DEFAULT(31), 1), >> + _SVE_INSNC ("sqdecp", 0x252a8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("sqdecp", 0x252a8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("sqdecp", 0x252a8c00, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0), >> _SVE_INSN ("sqdecp", 0x252a8800, 0xff3ffe00, sve_size_bhsd, 0, OP3 (Rd, SVE_Pg4_5, Rd), OP_SVE_XVW_BHSD, 0, 2), >> @@ -4702,6 +4705,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSNC ("sqinch", 0x0460c000, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("sqinch", 0x0470f000, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSN ("sqinch", 0x0460f000, 0xfff0fc00, sve_misc, 0, OP3 (Rd, Rd, SVE_PATTERN_SCALED), OP_SVE_XWU, F_OPD2_OPT | F_DEFAULT(31), 1), >> + _SVE_INSNC ("sqincp", 0x25288000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("sqincp", 0x25288000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("sqincp", 0x25288c00, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0), >> _SVE_INSN ("sqincp", 0x25288800, 0xff3ffe00, sve_size_bhsd, 0, OP3 (Rd, SVE_Pg4_5, Rd), OP_SVE_XVW_BHSD, 0, 2), >> @@ -4836,6 +4840,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSNC ("uqdech", 0x0460cc00, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("uqdech", 0x0460fc00, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_WU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSN ("uqdech", 0x0470fc00, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> + _SVE_INSNC ("uqdecp", 0x252b8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("uqdecp", 0x252b8000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("uqdecp", 0x252b8800, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_WV_BHSD, 0, 0), >> _SVE_INSN ("uqdecp", 0x252b8c00, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0), >> @@ -4850,6 +4855,7 @@ const struct aarch64_opcode aarch64_opcode_table[] = >> _SVE_INSNC ("uqinch", 0x0460c400, 0xfff0fc00, sve_misc, 0, OP2 (SVE_Zd, SVE_PATTERN_SCALED), OP_SVE_HU, F_OPD1_OPT | F_DEFAULT(31), C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("uqinch", 0x0460f400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_WU, F_OPD1_OPT | F_DEFAULT(31), 0), >> _SVE_INSN ("uqinch", 0x0470f400, 0xfff0fc00, sve_misc, 0, OP2 (Rd, SVE_PATTERN_SCALED), OP_SVE_XU, F_OPD1_OPT | F_DEFAULT(31), 0), >> + _SVE_INSNC ("uqincp", 0x25298000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VV_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSNC ("uqincp", 0x25298000, 0xff3ffe00, sve_size_hsd, 0, OP2 (SVE_Zd, SVE_Pg4_5), OP_SVE_VU_HSD, 0, C_SCAN_MOVPRFX, 0), >> _SVE_INSN ("uqincp", 0x25298800, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_WV_BHSD, 0, 0), >> _SVE_INSN ("uqincp", 0x25298c00, 0xff3ffe00, sve_size_bhsd, 0, OP2 (Rd, SVE_Pg4_5), OP_SVE_XV_BHSD, 0, 0),