From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: hongjiu.lu@intel.com, binutils@sourceware.org
Subject: Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
Date: Mon, 4 Dec 2023 17:30:25 +0100 [thread overview]
Message-ID: <14e62b44-2c5b-4a3b-a66a-cffa8f33578c@suse.com> (raw)
In-Reply-To: <20231124070213.3886483-2-lili.cui@intel.com>
On 24.11.2023 08:02, Cui, Lili wrote:
> @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
> return t->opcode_modifier.vex || t->opcode_modifier.evex;
> }
>
> +static INLINE bool
> +is_apx_rex2_encoding (void)
> +{
> + return i.rex2 || i.rex2_encoding;
> +}
This function is used just once. Do we really need it? Or else why
don't you use it near the end of md_assemble()?
> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> i.vex.bytes[3] |= i.mask.reg->reg_num;
> }
>
> +/* Build (2 bytes) rex2 prefix.
> + | D5h |
> + | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> + /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same. */
How do they handle it the same? (Also I don't think this is useful as
a code comment; it instead belongs in the description imo.)
> + i.vex.length = 2;
> + i.vex.bytes[0] = 0xd5;
> + /* For the W R X B bits, the variables of rex prefix will be reused. */
> + i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> + | (i.rex2 << 4) | i.rex);
> +}
> +
> static void
> process_immext (void)
> {
> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
> i.suffix = 0;
> /* Convert to byte registers. */
> if (i.types[1].bitfield.word)
> - j = 16;
> - else if (i.types[1].bitfield.dword)
> + /* There are 40 8-bit registers. */
> j = 32;
> + else if (i.types[1].bitfield.dword)
> + /* 32 8-bit registers + 32 16-bit registers. */
> + j = 64;
> else
> - j = 48;
> - if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> + /* 32 8-bit registers + 32 16-bit registers
> + + 32 32-bit registers. */
> + j = 96;
> + if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
> j += 8;
> i.op[1].regs -= j;
> }
I did comment on, in particular, the 8-bit register counts before.
Afaict the comments above are nevertheless unchanged and hence
still not really correct.
> @@ -5576,6 +5615,13 @@ md_assemble (char *line)
> return;
> }
>
> + /* Check for explicit REX2 prefix. */
> + if (i.rex2_encoding)
> + {
> + as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> + return;
> + }
Again I'm pretty sure I pointed out before that i.rex2_encoding reflects
use of {rex2}. Which then the error message should correctly refer to.
> @@ -5615,11 +5661,12 @@ md_assemble (char *line)
> && (i.op[1].regs->reg_flags & RegRex64) != 0)
> || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> - && i.rex != 0))
> + && (i.rex != 0 || i.rex2 != 0)))
> {
> int x;
>
> - i.rex |= REX_OPCODE;
> + if (!i.rex2)
> + i.rex |= REX_OPCODE;
> for (x = 0; x < 2; x++)
> {
> /* Look for 8 bit operand that uses old registers. */
> @@ -5630,7 +5677,7 @@ md_assemble (char *line)
> /* In case it is "hi" register, give up. */
> if (i.op[x].regs->reg_num > 3)
> as_bad (_("can't encode register '%s%s' in an "
> - "instruction requiring REX prefix."),
> + "instruction requiring REX/REX2 prefix."),
> register_prefix, i.op[x].regs->reg_name);
>
> /* Otherwise it is equivalent to the extended register.
> @@ -5642,11 +5689,11 @@ md_assemble (char *line)
> }
> }
>
> - if (i.rex == 0 && i.rex_encoding)
> + if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))
Doesn't this want to be
if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
?
> {
> /* Check if we can add a REX_OPCODE byte. Look for 8 bit operand
> that uses legacy register. If it is "hi" register, don't add
> - the REX_OPCODE byte. */
> + rex and rex2 prefix. */
> int x;
> for (x = 0; x < 2; x++)
> if (i.types[x].bitfield.class == Reg
> @@ -5656,6 +5703,7 @@ md_assemble (char *line)
> {
> gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> i.rex_encoding = false;
> + i.rex2_encoding = false;
> break;
> }
>
> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
> i.rex = REX_OPCODE;
> }
>
> - if (i.rex != 0)
> + if (i.rex2 != 0 || i.rex2_encoding)
> + {
> + build_rex2_prefix ();
> + /* The individual REX.RXBW bits got consumed. */
> + i.rex &= REX_OPCODE;
> + }
> + else if (i.rex != 0)
> add_prefix (REX_OPCODE | i.rex);
>
> insert_lfence_before ();
> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
> /* {rex} */
> i.rex_encoding = true;
> break;
> + case Prefix_REX2:
> + /* {rex2} */
> + i.rex2_encoding = true;
> + break;
> case Prefix_NoOptimize:
> /* {nooptimize} */
> i.no_optimize = true;
> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
> return 0;
> }
>
> +/* Check if Egprs operands are valid for the instruction. */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> + if (!t->opcode_modifier.noegpr)
> + return 0;
> +
> + for (unsigned int op = 0; op < i.operands; op++)
> + {
> + if (i.types[op].bitfield.class != Reg
> + /* Special case for (%dx) while doing input/output op */
> + || i.input_output_operand)
Didn't we agree that this extra condition isn't necessary, once the
producer site correctly updates all state (which was supposed to be
done in a small prereq patch)?
> @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
> /* Do not verify operands when there are none. */
> if (!t->operands)
> {
> - if (VEX_check_encoding (t))
> + /* When there are no operands, we still need to use the
> + check_EgprOperands function to check whether {rex2} is valid. */
> + if (VEX_check_encoding (t) || check_EgprOperands (t))
As before imo either the function name wants changing (so it becomes
reasonable to use here, without the need for a comment explaining the
oddity), or you simply open-code the sole check that is needed here
(afaict: t->opcode_modifier.noegpr && i.rex2_encoding).
> @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
> continue;
> }
>
> + /* Check if EGRPS operands(r16-r31) are valid. */
EGPR?
> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -217,6 +217,7 @@ accept various extension mnemonics. For example,
> @code{avx10.1/256},
> @code{avx10.1/128},
> @code{user_msr},
> +@code{apx_f},
> @code{amx_int8},
> @code{amx_bf16},
> @code{amx_fp16},
> @@ -983,6 +984,9 @@ Different encoding options can be specified via pseudo prefixes:
> instructions (x86-64 only). Note that this differs from the @samp{rex}
> prefix which generates REX prefix unconditionally.
>
> +@item
> +@samp{@{rex2@}} -- encode with REX2 prefix
This isn't in line with what's said for {rex}. Iirc we were in
agreement that we want both to behave consistently. In which case
documentation also needs to describe them consistently.
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> + .allow_index_reg
> + .text
> +_start:
> + test $0x7, %r24b
> + test $0x7, %r24d
> + test $0x7, %r24
> + test $0x7, %r24w
> +## REX2.M bit
> + imull %eax, %r15d
> + imull %eax, %r16d
> + punpckldq (%r18), %mm2
> +## REX2.R4 bit
> + leal (%rax), %r16d
> + leal (%rax), %r17d
> + leal (%rax), %r18d
> + leal (%rax), %r19d
> + leal (%rax), %r20d
> + leal (%rax), %r21d
> + leal (%rax), %r22d
> + leal (%rax), %r23d
> + leal (%rax), %r24d
> + leal (%rax), %r25d
> + leal (%rax), %r26d
> + leal (%rax), %r27d
> + leal (%rax), %r28d
> + leal (%rax), %r29d
> + leal (%rax), %r30d
> + leal (%rax), %r31d
> +## REX2.X4 bit
> + leal (,%r16), %eax
> + leal (,%r17), %eax
> + leal (,%r18), %eax
> + leal (,%r19), %eax
> + leal (,%r20), %eax
> + leal (,%r21), %eax
> + leal (,%r22), %eax
> + leal (,%r23), %eax
> + leal (,%r24), %eax
> + leal (,%r25), %eax
> + leal (,%r26), %eax
> + leal (,%r27), %eax
> + leal (,%r28), %eax
> + leal (,%r29), %eax
> + leal (,%r30), %eax
> + leal (,%r31), %eax
> +## REX.B4 bit
Further up you properly say REX2. Here and below it's only REX?
> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,61 @@ pseudos:
> {rex} vmovaps %xmm7,%xmm2
> {rex} vmovaps %xmm17,%xmm2
> {rex} rorx $7,%eax,%ebx
> + {rex2} vmovaps %xmm7,%xmm2
> + {rex2} xsave (%rax)
> + {rex2} xsaves (%ecx)
> + {rex2} xsaves64 (%ecx)
> + {rex2} xsavec (%ecx)
> + {rex2} xrstors (%ecx)
> + {rex2} xrstors64 (%ecx)
> +
> + #All opcodes in the row 0xa* prefixed REX2 are illegal.
> + #{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> + {rex2} mov 0x90909090,%al
> + {rex2} movabs 0x1,%al
> + {rex2} cmpsb %es:(%edi),%ds:(%esi)
> + {rex2} lodsb
> + {rex2} lods %ds:(%esi),%al
> + {rex2} lodsb (%esi)
> + {rex2} movs
> + {rex2} movs (%esi), (%edi)
> + {rex2} scasl
> + {rex2} scas %es:(%edi),%eax
> + {rex2} scasb (%edi)
> + {rex2} stosb
> + {rex2} stosb (%edi)
> + {rex2} stos %eax,%es:(%edi)
> +
> + #All opcodes in the row 0x7* prefixed REX2 are illegal.
This also covers map 1 row 8, doesn't it?
> + {rex2} jo .+2-0x70
> + {rex2} jno .+2-0x70
> + {rex2} jb .+2-0x70
> + {rex2} jae .+2-0x70
> + {rex2} je .+2-0x70
> + {rex2} jne .+2-0x70
> + {rex2} jbe .+2-0x70
> + {rex2} ja .+2-0x70
> + {rex2} js .+2-0x70
> + {rex2} jns .+2-0x70
> + {rex2} jp .+2-0x70
> + {rex2} jnp .+2-0x70
> + {rex2} jl .+2-0x70
> + {rex2} jge .+2-0x70
> + {rex2} jle .+2-0x70
> + {rex2} jg .+2-0x70
> +
> + #All opcodes in the row 0x7* prefixed REX2 are illegal.
> + {rex2} in $0x90,%al
> + {rex2} in $0x90
> + {rex2} out $0x90,%al
> + {rex2} out $0x90
> + {rex2} jmp *%eax
> + {rex2} loop foo
Isn't this row 0xE?
> + #All opcodes in the row 0xf3* prefixed REX2 are illegal.
This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps
best to either say "map 1" and omit the "f" or at least write 0x0f3*
or slightly better 0x0f 0x3*.
> --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> @@ -360,6 +360,19 @@ _start:
> {rex} movaps (%r8),%xmm2
> {rex} phaddw (%rcx),%mm0
> {rex} phaddw (%r8),%mm0
> + {rex2} mov %al,%ah
> + {rex2} shl %cl, %eax
> + {rex2} cmp %cl, %dl
> + {rex2} mov $1, %bl
> + {rex2} movl %eax,%ebx
> + {rex2} movl %eax,%r14d
> + {rex2} movl %eax,(%r8)
> + {rex2} movaps %xmm7,%xmm2
> + {rex2} movaps %xmm7,%xmm12
> + {rex2} movaps (%rcx),%xmm2
> + {rex2} movaps (%r8),%xmm2
> + {rex2} pmullw %mm0,%mm6
> +
>
> movb (%rbp),%al
> {disp8} movb (%rbp),%al
No double blank lines please.
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
Disassembler comments (if any) in a separate (later) mail again.
> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
> "64" },
> { "USER_MSR",
> "64" },
> + { "APX_F",
> + "XSAVE|64" },
> };
>
> /* This array is populated as process_i386_initializers() walks cpu_flags[]. */
> @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
> BITFIELD (FRED),
> BITFIELD (LKGS),
> BITFIELD (USER_MSR),
> + BITFIELD (APX_F),
> BITFIELD (MWAITX),
> BITFIELD (CLZERO),
> BITFIELD (OSPKE),
> @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
> BITFIELD (ATTSyntax),
> BITFIELD (IntelSyntax),
> BITFIELD (ISA64),
> + BITFIELD (NoEgpr),
> };
>
> #define CLASS(n) #n, n
> @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
> return elem_size;
> }
>
> +static bool
> +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> + const char *cpu_flags)
> +{
> + /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD. */
> + if (strcmp (cpu_flags, "XSAVES") >= 0
> + || strcmp (cpu_flags, "XSAVEC") >= 0
> + || strcmp (cpu_flags, "Xsave") >= 0
> + || strcmp (cpu_flags, "Xsaveopt") >= 0)
> + return true;
Wasn't this intended to be dropped, being redundant with the opcode table
attributes?
> + /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> + are reserved under REX2 and triggers #UD when prefixed with REX2 */
> + if (space == 0)
> + switch (opcode >> 4)
Both here and ...
> + {
> + case 0x4:
> + case 0x7:
> + case 0xA:
> + case 0xE:
> + return true;
> + default:
> + return false;
> + }
> +
> + if (space == SPACE_0F)
> + switch (opcode >> 4)
... here, don't you also need to mask off further bits? There are
quite a few opcodes which have a kind-of ModR/M byte encoded directly
in the opcode, for example.
> + {
> + case 0x3:
> + case 0x8:
> + return true;
> + default:
> + return false;
> + }
> +
> + return false;
> +}
> +
> static void
> process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
> unsigned int prefix, const char *extension_opcode,
> - char **opnd, int lineno)
> + char **opnd, int lineno, bool rex2_disallowed)
> {
> char *str, *next, *last;
> bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
> @@ -1202,6 +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
> || modifiers[SAE].value))
> modifiers[EVex].value = EVEXDYN;
>
> + /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
> + For template supports both Vex and EVex allowing EGPR. */
"Templates supporting both Vex and EVex allow EGPR."
> + if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> + && !modifiers[EVex].value)
> + modifiers[NoEgpr].value = 1;
> +
> output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
> }
>
> @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
> ident, 2 * (int)length, opcode, end, i);
> free (ident);
>
> + /* Add some specilal handle for current entry. */
> + bool has_special_handle = rex2_disallowed (opcode, space, cpu_flags);
The local variable (if one is needed in the first place) wants naming as
usefully as the function now is named. Similarly the comment would want
improving alonmg those lines.
> process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> - extension_opcode, operand_types, lineno);
> + extension_opcode, operand_types, lineno,
> + has_special_handle);
>
> process_i386_cpu_flag (table, cpu_flags, NULL, ",", " ", lineno, CpuMax);
>
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -138,6 +138,7 @@
> #define Vsz256 Vsz=VSZ256
> #define Vsz512 Vsz=VSZ512
>
> +
> // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
> // the bit to mark commutative VEX encodings where swapping the source
> // operands may allow to switch from 3-byte to 2-byte VEX encoding.
Stray change (in general please avoid introducing double blank lines, as
those make patch context less useful).
Jan
next prev parent reply other threads:[~2023-12-04 16:30 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-24 7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24 7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30 ` Jan Beulich [this message]
2023-12-05 13:31 ` Cui, Lili
2023-12-06 7:52 ` Jan Beulich
2023-12-06 12:43 ` Cui, Lili
2023-12-07 9:01 ` Jan Beulich
2023-12-08 3:10 ` Cui, Lili
2023-11-24 7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24 7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38 ` Jan Beulich
2023-12-08 15:21 ` Cui, Lili
2023-12-11 8:34 ` Jan Beulich
2023-12-12 10:44 ` Cui, Lili
2023-12-12 11:16 ` Jan Beulich
2023-12-12 12:32 ` Cui, Lili
2023-12-12 12:39 ` Jan Beulich
2023-12-12 13:15 ` Cui, Lili
2023-12-12 14:13 ` Jan Beulich
2023-12-13 7:36 ` Cui, Lili
2023-12-13 7:48 ` Jan Beulich
2023-12-12 12:58 ` Cui, Lili
2023-12-12 14:04 ` Jan Beulich
2023-12-13 8:35 ` Cui, Lili
2023-12-13 9:13 ` Jan Beulich
2023-12-07 13:34 ` Jan Beulich
2023-12-11 6:16 ` Cui, Lili
2023-12-11 8:43 ` Jan Beulich
2023-12-11 11:50 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05 ` Jan Beulich
2023-12-11 6:16 ` Cui, Lili
2023-12-11 8:55 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12 ` Jan Beulich
2023-12-11 13:36 ` Cui, Lili
2023-12-11 16:50 ` Jan Beulich
2023-12-13 10:42 ` Cui, Lili
2024-03-22 10:02 ` Jan Beulich
2024-03-22 10:31 ` Jan Beulich
2024-03-26 2:04 ` Cui, Lili
2024-03-26 7:06 ` Jan Beulich
2024-03-26 7:18 ` Cui, Lili
2024-03-22 10:59 ` Jan Beulich
2024-03-26 8:22 ` Cui, Lili
2024-03-26 9:30 ` Jan Beulich
2024-03-27 2:41 ` Cui, Lili
2023-12-08 14:27 ` Jan Beulich
2023-12-12 5:53 ` Cui, Lili
2023-12-12 8:28 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17 ` Jan Beulich
2023-12-15 8:38 ` Cui, Lili
2023-12-15 8:44 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27 ` Jan Beulich
2023-12-12 3:18 ` Hu, Lin1
2023-12-12 8:41 ` Jan Beulich
2023-12-13 5:31 ` Hu, Lin1
2023-12-12 8:45 ` Jan Beulich
2023-12-13 6:06 ` Hu, Lin1
2023-12-13 8:19 ` Jan Beulich
2023-12-13 8:34 ` Hu, Lin1
2023-11-24 7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24 7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22 ` Cui, Lili
2023-11-24 12:14 ` Jan Beulich
2023-12-12 2:57 ` Lu, Hongjiu
2023-12-12 8:16 ` Cui, Lili
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14e62b44-2c5b-4a3b-a66a-cffa8f33578c@suse.com \
--to=jbeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hongjiu.lu@intel.com \
--cc=lili.cui@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).