From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 74C1E3858020 for ; Wed, 6 Dec 2023 07:52:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 74C1E3858020 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 74C1E3858020 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::432 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701849158; cv=none; b=XxB1H1bps3lUax0J8MczRpIJF/Gvp1hAZU+mbmOjmsCINYytO/7kDXevfPbTQ21VKQUoFWBFVaXC3PjvEoLsjw2LVrWPD6R1T4btAy2qZ2K2I3tQETV8vSWOZpxzOyQ2P9YUxAo0Y+oM7NbYGz2XGF8zUY8OO1ZEZL9oc8aPdZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701849158; c=relaxed/simple; bh=hC/r1r4BKhHoYhGYSk6tR3Jh0iGbUHD16OMOjaWNS9s=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=IEGZNPcQkzWo/ZbYnMLsBvaTrUjxecUku9ixxkZHNBwOohcvBFqrc0OIesiDXbd8VY95UhmpYL7gi9XCFe1foWKHXEAgzqqnIe58O01E2b6du6EgK3zr7JziXSVtHki2lBRJLyGURmc+AQshu1Huwm5lbJ/gJZckZOoalEQm1k4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-3334a701cbbso489372f8f.0 for ; Tue, 05 Dec 2023 23:52:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1701849154; x=1702453954; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=qJUqyPgkyuv7HPq7Q5cGs1xA5PQa+h5RtyYxVcSJlqo=; b=gvNVkjcSWr8I/7x520GA0KDTY2vzybFC66beJaJGp/37DanJIHdxMoRg10lTujvhdh zOmvjakGCoXN9OBJoi65/xLTuEaelPlS11NPTwuhliw6c6mXI2UAZJly7gxs28qXyK/L IkRH2HzdP0/MKUS/Szf6ir3QDPaD0m8WXfzhvvSijMJChygxGPG7nf7urYmUKLlaPMM7 8JBSq828TXf4JejpfHHNqBvEmDSzS4J9BNaPKYcUk67HoKyNjAFDnzDaapsc8icOweBF EIxK/Pcr5Rn/erFn33b22Wu0tBgTn7mwr+4ZJN97JSLgVDgTjoUzsTDafbHsmCMJNWzm woNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701849154; x=1702453954; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qJUqyPgkyuv7HPq7Q5cGs1xA5PQa+h5RtyYxVcSJlqo=; b=tw/t2uRhIKLNQgvroMrAOrleuUReAQhCNgDqrBweZBDarJqcXNjVga3gsugAA0MxTB AZMJBGG4oXEicaFu8LYRZZ9gF8oSNQf1wygfjycIiE6a/EcSVUkyGO5Q0yKzDPJlSRSC X5LJtpx55iR6UZHBJT5nbFESzT9thpPsfVqE2K6Q5zQvnmJ21WO7ZWLT5q3FwYiivsPa DQX8A33DHEr+aaqwOmDeI3VIXHWwZKOAKUMXyhiUHx9Gw5v0rDWYbgXpUYHIz/NDqewn HhYGvKta3nQQMWrTLH0P9s9mBNd3MIpt0DpRQgIVV4wnTZk+F+B7VdO76D1FGzAgr/8I bnAw== X-Gm-Message-State: AOJu0YyUk2XDFdDUKzBskBX9f3WAjJMCy72AEXYOWQreSXw+aC0LTjZk 67pyHPvumtR3HZW0PDjZ3/UTr1CKXuxRFvXaRw2H X-Google-Smtp-Source: AGHT+IHhd4B3z8xLJPR7SjKTX0lQr4DSNFkkpWc+kN4tJT7ttWiMT+ImFeu035flkQrR1/NQ0UC62Q== X-Received: by 2002:a05:600c:3507:b0:40b:5e59:99b5 with SMTP id h7-20020a05600c350700b0040b5e5999b5mr221623wmq.213.1701849153963; Tue, 05 Dec 2023 23:52:33 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id f9-20020adff8c9000000b003333d2c6420sm10530478wrq.74.2023.12.05.23.52.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Dec 2023 23:52:33 -0800 (PST) Message-ID: <9578ce1e-233e-4c9a-8003-a619a685c236@suse.com> Date: Wed, 6 Dec 2023 08:52:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Content-Language: en-US To: "Cui, Lili" Cc: "Lu, Hongjiu" , "binutils@sourceware.org" References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-2-lili.cui@intel.com> <14e62b44-2c5b-4a3b-a66a-cffa8f33578c@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3026.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 05.12.2023 14:31, Cui, Lili wrote: >> On 24.11.2023 08:02, Cui, Lili wrote: >>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void) >>> i.vex.bytes[3] |= i.mask.reg->reg_num; } >>> >>> +/* Build (2 bytes) rex2 prefix. >>> + | D5h | >>> + | m | R4 X4 B4 | W R X B | >>> +*/ >>> +static void >>> +build_rex2_prefix (void) >>> +{ >>> + /* Rex2 reuses i.vex because they handle i.tm.opcode_space the >>> +same. */ >> >> How do they handle it the same? (Also I don't think this is useful as a code >> comment; it instead belongs in the description imo.) >> > > Moved the comment to the functions description. > > /* Build (2 bytes) rex2 prefix. > | D5h | > | m | R4 X4 B4 | W R X B | > > Rex2 reuses i.vex as they handle i.tm.opcode_space the same way. */ > static void > build_rex2_prefix (void) > > > In function "output_insn", some handle like this. > > if (!i.vex.length) > switch (i.tm.opcode_space) > { > case SPACE_BASE: > break; > case SPACE_0F: > ++j; > break; > case SPACE_0F38: > case SPACE_0F3A: > j += 2; > break; > default: > abort (); > } > ..... > if (!i.vex.length > && i.tm.opcode_space != SPACE_BASE) > { > *p++ = 0x0f; > if (i.tm.opcode_space != SPACE_0F) > *p++ = i.tm.opcode_space == SPACE_0F38 > ? 0x38 : 0x3a; > } Oh, I see. That's pretty remote. How about replacing "the same way"? Perhaps "Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"? While in that form it's fine to remain in a code comment, just a general clarification: When I say something wants saying in the "description", it's (almost) always that I mean the patch description, not anything else. >>> @@ -4385,12 +4414,16 @@ optimize_encoding (void) >>> i.suffix = 0; >>> /* Convert to byte registers. */ >>> if (i.types[1].bitfield.word) >>> - j = 16; >>> - else if (i.types[1].bitfield.dword) >>> + /* There are 40 8-bit registers. */ >>> j = 32; >>> + else if (i.types[1].bitfield.dword) >>> + /* 32 8-bit registers + 32 16-bit registers. */ >>> + j = 64; >>> else >>> - j = 48; >>> - if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4) >>> + /* 32 8-bit registers + 32 16-bit registers >>> + + 32 32-bit registers. */ >>> + j = 96; >>> + if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum >>> +< 4) >>> j += 8; >>> i.op[1].regs -= j; >>> } >> >> I did comment on, in particular, the 8-bit register counts before. >> Afaict the comments above are nevertheless unchanged and hence still not >> really correct. >> > > Changed to : > > if (flag_code == CODE_64BIT || base_regnum < 4) > { > i.types[1].bitfield.byte = 1; > /* Ignore the suffix. */ > i.suffix = 0; > /* Convert to byte registers. 8-bit registers are special, > RegRex64 and non-RegRex64 each have 8 registers. */ > if (i.types[1].bitfield.word) > /* 32 (or 40) 8-bit registers. */ > j = 32; > else if (i.types[1].bitfield.dword) > /* 32 (or 40)8-bit registers + 32 16-bit registers. */ Nit: Missing blank. > j = 64; > else > /* 32 (or 40) 8-bit registers + 32 16-bit registers > + 32 32-bit registers. */ > j = 96; > > if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4) > j += 8; > i.op[1].regs -= j; > } I won't insist on further changes, but imo as you're adding comments, also adding a comment to this last if() (which finally takes care of the 8-bit reg special case) would be advisable. >>> @@ -5663,7 +5711,13 @@ md_assemble (char *line) >>> i.rex = REX_OPCODE; >>> } >>> >>> - if (i.rex != 0) >>> + if (i.rex2 != 0 || i.rex2_encoding) >>> + { >>> + build_rex2_prefix (); >>> + /* The individual REX.RXBW bits got consumed. */ >>> + i.rex &= REX_OPCODE; >>> + } >>> + else if (i.rex != 0) >>> add_prefix (REX_OPCODE | i.rex); >>> >>> insert_lfence_before (); >>> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, >> bool prefix_only) >>> /* {rex} */ >>> i.rex_encoding = true; >>> break; >>> + case Prefix_REX2: >>> + /* {rex2} */ >>> + i.rex2_encoding = true; >>> + break; >>> case Prefix_NoOptimize: >>> /* {nooptimize} */ >>> i.no_optimize = true; >>> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t) >>> return 0; >>> } >>> >>> +/* Check if Egprs operands are valid for the instruction. */ >>> + >>> +static int >>> +check_EgprOperands (const insn_template *t) { >>> + if (!t->opcode_modifier.noegpr) >>> + return 0; >>> + >>> + for (unsigned int op = 0; op < i.operands; op++) >>> + { >>> + if (i.types[op].bitfield.class != Reg >>> + /* Special case for (%dx) while doing input/output op */ >>> + || i.input_output_operand) >> >> Didn't we agree that this extra condition isn't necessary, once the producer >> site correctly updates all state (which was supposed to be done in a small >> prereq patch)? >> > > I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some related instructions had two memory operands, so it raised a lot of invalid test case fail, and more ugly code needed to be added. In the end, I felt that this simple modification might be better. Changing InOutPortReg of course isn't going to be easy. But that also wasn't what we had discussed. Instead (I thought) we agreed on ... > @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string) > && !operand_type_check (i.types[this_operand], disp)) > { > i.types[this_operand] = i.base_reg->reg_type; > + i.types[this_operand].bitfield.class = 0; > i.input_output_operand = true; > return 1; amending this code to also correctly set i.op[].regs. Perhaps it would also be best to actually clear i.base_reg (for there not being any memory operand). (FTAOD: All of this in a separate prereq patch, not here. The code creating inconsistent state has been a [latent] bug for a long time.) >>> --- a/gas/doc/c-i386.texi >>> +++ b/gas/doc/c-i386.texi >>> @@ -217,6 +217,7 @@ accept various extension mnemonics. For example, >>> @code{avx10.1/256}, @code{avx10.1/128}, @code{user_msr}, >>> +@code{apx_f}, >>> @code{amx_int8}, >>> @code{amx_bf16}, >>> @code{amx_fp16}, >>> @@ -983,6 +984,9 @@ Different encoding options can be specified via >> pseudo prefixes: >>> instructions (x86-64 only). Note that this differs from the >>> @samp{rex} prefix which generates REX prefix unconditionally. >>> >>> +@item >>> +@samp{@{rex2@}} -- encode with REX2 prefix >> >> This isn't in line with what's said for {rex}. Iirc we were in agreement that we >> want both to behave consistently. In which case documentation also needs to >> describe them consistently. >> > > Changed to > > @item > @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector > instructions (APX_F only). Note that this differs from the @samp{rex2} > prefix which generates REX2 prefix unconditionally. Except there's no "rex2" prefix according to the present implementation. >>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s >>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s >>> @@ -5,3 +5,61 @@ pseudos: >>> {rex} vmovaps %xmm7,%xmm2 >>> {rex} vmovaps %xmm17,%xmm2 >>> {rex} rorx $7,%eax,%ebx >>> + {rex2} vmovaps %xmm7,%xmm2 >>> + {rex2} xsave (%rax) >>> + {rex2} xsaves (%ecx) >>> + {rex2} xsaves64 (%ecx) >>> + {rex2} xsavec (%ecx) >>> + {rex2} xrstors (%ecx) >>> + {rex2} xrstors64 (%ecx) >>> + >>> + #All opcodes in the row 0xa* prefixed REX2 are illegal. >>> + #{rex2} test (0xa8) is a special case, it will remap to test (0xf6) >>> + {rex2} mov 0x90909090,%al >>> + {rex2} movabs 0x1,%al >>> + {rex2} cmpsb %es:(%edi),%ds:(%esi) >>> + {rex2} lodsb >>> + {rex2} lods %ds:(%esi),%al >>> + {rex2} lodsb (%esi) >>> + {rex2} movs >>> + {rex2} movs (%esi), (%edi) >>> + {rex2} scasl >>> + {rex2} scas %es:(%edi),%eax >>> + {rex2} scasb (%edi) >>> + {rex2} stosb >>> + {rex2} stosb (%edi) >>> + {rex2} stos %eax,%es:(%edi) >>> + >>> + #All opcodes in the row 0x7* prefixed REX2 are illegal. >> >> This also covers map 1 row 8, doesn't it? >> > > No, I didn't find 0xf8* in opcode table. Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait, depends on what "opcode table" here means: The manual's or opcodes/i386-opc.tbl? The latter of course doesn't have them, as they're ... >>> + {rex2} jo .+2-0x70 >>> + {rex2} jno .+2-0x70 >>> + {rex2} jb .+2-0x70 >>> + {rex2} jae .+2-0x70 >>> + {rex2} je .+2-0x70 >>> + {rex2} jne .+2-0x70 >>> + {rex2} jbe .+2-0x70 >>> + {rex2} ja .+2-0x70 >>> + {rex2} js .+2-0x70 >>> + {rex2} jns .+2-0x70 >>> + {rex2} jp .+2-0x70 >>> + {rex2} jnp .+2-0x70 >>> + {rex2} jl .+2-0x70 >>> + {rex2} jge .+2-0x70 >>> + {rex2} jle .+2-0x70 >>> + {rex2} jg .+2-0x70 ... the disp32/disp16 forms of these branches, which are created only during relaxation. >>> + /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8* >>> + are reserved under REX2 and triggers #UD when prefixed with REX2 >>> + */ if (space == 0) >>> + switch (opcode >> 4) >> >> Both here and ... >> >>> + { >>> + case 0x4: >>> + case 0x7: >>> + case 0xA: >>> + case 0xE: >>> + return true; >>> + default: >>> + return false; >>> + } >>> + >>> + if (space == SPACE_0F) >>> + switch (opcode >> 4) >> >> ... here, don't you also need to mask off further bits? There are quite a few >> opcodes which have a kind-of ModR/M byte encoded directly in the opcode, >> for example. >> > > Thanks for reminding. Added the code like this. > > /* Some opcodes encode a ModR/M byte directly in the opcode. */ > unsigned long long > base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode; Can length be 0? I didn't think so, and then base_opcode = opcode >> (8 * length - 8); would be all you need. Also in the comment, I think it would be slightly better to say "ModR/M-like byte". Jan