From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
"ccoutant@gmail.com" <ccoutant@gmail.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
Date: Mon, 6 Nov 2023 15:20:39 +0000 [thread overview]
Message-ID: <SJ0PR11MB560042CC079FE35596CD34E49EAAA@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9373c79d-85bb-15bd-4501-7634687d9a8e@suse.com>
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -406,6 +409,11 @@ struct _i386_insn
> > /* Compressed disp8*N attribute. */
> > unsigned int memshift;
> >
> > + /* No CSPAZO flags update.*/
> > + bool has_nf;
> > +
> > + bool has_zero_upper;
> > +
>
> Can both please be introduced when they're needed, not randomly ahead of
> time?
Moved has_nf to patch 2/8 and deleted has_zero_upper.
> > @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
> > if (r->reg_flags & RegRex)
> > nr += 8;
> >
> > + if (r->reg_flags & RegRex2)
> > + nr += 16;
> > +
> > if (r->reg_flags & RegVRex)
> > nr += 16;
>
> Perhaps fold to
>
> if (r->reg_flags & (RegVRex | RegRex2))
> nr += 16;
>
> ? Irrespective an assertion may be worthwhile that both flags aren't set at the
> same time?
Done.
>
> > @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> > i.vex.bytes[3] |= i.mask.reg->reg_num; }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > + | D5h |
> > + | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > + i.vex.length = 2;
> > + i.vex.bytes[0] = 0xd5;
> > + i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > + | (i.rex2 << 4) | i.rex);
> > +}
>
> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
> i.vex. Is that really necessary? (If so, a comment next to the field declaration
> may be warranted.)
>
Added comment for it.
/* For the W R X B bits, the variables of rex prefix will be reused. */
i.vex.bytes[1] = ((i.tm.opcode_space << 7)
| (i.rex2 << 4) | i.rex);
> Speaking of v1: Can you please make sure you have correct version tags on
> submissions of updated patch versions?
>
I used git to send all the patches at once( git send-email --cover-letter --annotate --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.
> > @@ -4423,12 +4460,16 @@ optimize_encoding (void)
> > i.suffix = 0;
> > /* Convert to byte registers. */
> > if (i.types[1].bitfield.word)
> > - j = 16;
> > - else if (i.types[1].bitfield.dword)
> > + /* There are 32 8-bit registers. */
>
> Please make sure comments are actually correct. With your additions there
> are 40 8-bit registers; prior to that there were 24. The j += 8 further down deal
> with that difference, and the comment here (if one is to be added) wants to
> tell the full truth.
>
Done.
> > @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> > case register_type_mismatch:
> > err_msg = _("register type mismatch");
> > break;
> > + case register_type_of_address_mismatch:
> > + err_msg = _("register type of address mismatch");
> > + break;
>
> I have a concern with wording / naming here: If I saw this in an error message,
> I wouldn't know what is meant. Maybe something along the lines of "cannot
> use an extended GPR for addressing"? And then the enumerator suitabley
> renamed as well?
>
Changed to
+ case unsupported_EGPR_for_addressing:
+ err_msg = _("unsupported EGPR for addressing");
+ break;
> > @@ -5578,7 +5625,7 @@ md_assemble (char *line)
> > as_warn (_("translating to `%sp'"), insn_name (&i.tm));
> > }
> >
> > - if (is_any_vex_encoding (&i.tm))
> > + if (is_any_vex_encoding (&i.tm))
> > {
>
> Stray change, breaking indentation?
>
Done.
> > @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> > return;
> > }
> >
> > + /* Check for explicit REX2 prefix. */
> > + if (i.rex2 || i.rex2_encoding)
>
> This open-codes is_any_apx_rex2_encoding(). But read on.
>
> > + {
> > + as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>
> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
> the i.rex2 check above is intended to cover. Error message comment, and
> condition want to reflect that.
>
Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.
{rex} vmovaps %xmm7,%xmm2
{rex} vmovaps %xmm17,%xmm2
{rex} rorx $7,%eax,%ebx
+ {rex2} vmovaps %xmm7,%xmm2
> > @@ -5633,11 +5687,11 @@ md_assemble (char *line)
> > && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > - && i.rex != 0))
> > + && (i.rex != 0 || i.rex2 != 0)))
> > {
> > int x;
> > -
> > - i.rex |= REX_OPCODE;
>
> Please don't remove blank lines like this.
>
Done.
> > @@ -5647,9 +5701,11 @@ md_assemble (char *line)
> > gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> > /* In case it is "hi" register, give up. */
> > if (i.op[x].regs->reg_num > 3)
> > - as_bad (_("can't encode register '%s%s' in an "
> > - "instruction requiring REX prefix."),
> > - register_prefix, i.op[x].regs->reg_name);
> > + {
> > + as_bad (_("can't encode register '%s%s' in an "
> > + "instruction requiring REX/REX2 prefix."),
> > + register_prefix, i.op[x].regs->reg_name);
> > + }
>
> There's no need to introduce braces here. Without doing so this will also be
> less of a change.
>
Done.
> > @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
> > return 0;
> > }
> >
> > +/* Check if Egprs operands are valid for the instruction. */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > + if (t->opcode_modifier.noegpr)
> > + {
>
> This scope effectively covers the entire function. Did you consider
>
> if (!t->opcode_modifier.noegpr)
> return 0;
>
> to aid readability?
>
Done.
> > + for (unsigned int op = 0; op < i.operands; op++)
> > + {
> > + if (i.types[op].bitfield.class != Reg
> > + /* Special case for (%dx) while doing input/output op */
> > + || i.input_output_operand)
>
> Why is this needed? The register table entry for %dx ...
>
> > + continue;
> > +
> > + if (i.op[op].regs->reg_flags & RegRex2)
>
> ... doesn't have this bit set anyway.
>
For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags will cause segment fault.
> > + {
> > + i.error = register_type_mismatch;
> > + return 1;
> > + }
> > + }
> > +
> > + if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> > + || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> > + {
> > + i.error = register_type_of_address_mismatch;
> > + return 1;
> > + }
> > +
> > + /* Check pseudo prefix {rex2} are valid. */
> > + if (i.rex2_encoding)
> > + {
> > + i.error = invalid_pseudo_prefix;
> > + return 1;
> > + }
>
> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
> apply. Why would an inapplicable {rex2} be treated as an error here? This
> would then also ...
>
> > @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> > /* Do not verify operands when there are none. */
> > if (!t->operands)
> > {
> > - if (VEX_check_encoding (t))
> > + if (VEX_check_encoding (t) || check_EgprOperands (t))
> > {
> > specific_error = progress (i.error);
> > continue;
>
> ... eliminate the need for this change, which is kind of bogus anyway:
> There are no operands here, so calling a function of the given name is at least
> suspicious.
>
We have these tests and I'm confused whether to remove them or not.
+ #All opcodes in the row 0xf3* prefixed REX2 are illegal.
+ {rex2} wrmsr
+ {rex2} rdtsc
+ {rex2} rdmsr
+ {rex2} sysenter
+ {rex2} sysexitl
+ {rex2} rdpmc
> > @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
> > i.vec_encoding = vex_encoding_error;
> > }
> >
> > + if (r->reg_flags & RegRex2)
> > + {
> > + if (!cpu_arch_flags.bitfield.cpuapx_f
> > + || flag_code != CODE_64BIT)
> > + return false;
> > + }
>
> Please fold the two if()s into one (unless of course you know that the outer
> one is going to be extended in a subsequent patch).
>
Yes, other code will be added in the outer if with patch2/8.
> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -216,6 +216,7 @@ accept various extension mnemonics. For example,
> > @code{avx10.1/512}, @code{avx10.1/256}, @code{avx10.1/128},
> > +@code{apx},
> > @code{amx_int8},
> > @code{amx_bf16},
> > @code{amx_fp16},
> > @@ -1662,7 +1663,7 @@ supported on the CPU specified. The choices for
> @var{cpu_type} are:
> > @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab
> > @samp{.cx16} @item @samp{.padlock} @tab @samp{.clzero} @tab
> > @samp{.mwaitx} @tab @samp{.rdpru} @item @samp{.mcommit} @tab
> > @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb} -@item
> > @samp{.tlbsync}
> > +@item @samp{.tlbsync} @tab @samp{.apx}
> > @end multitable
>
> DYM apx_f in both cases?
>
> Also don't you need to also mention {rex2} somewhere in this file?
>
Done.
> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> > [ ]*[a-f0-9]+: 37 \(bad\)
> >
> > 0+1 <aad0>:
> > -[ ]*[a-f0-9]+: d5 \(bad\)
> > +[ ]*[a-f0-9]+: d5 rex2
> > [ ]*[a-f0-9]+: 0a .byte 0xa
> >
> > 0+3 <aad1>:
> > -[ ]*[a-f0-9]+: d5 \(bad\)
> > +[ ]*[a-f0-9]+: d5 rex2
> > [ ]*[a-f0-9]+: 02 .byte 0x2
> >
> > 0+5 <aam0>:
> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> > [ ]*[a-f0-9]+: 37 \(bad\)
> >
> > 0+1 <aad0>:
> > -[ ]*[a-f0-9]+: d5 \(bad\)
> > +[ ]*[a-f0-9]+: d5 rex2
> > [ ]*[a-f0-9]+: 0a .byte 0xa
> >
> > 0+3 <aad1>:
> > -[ ]*[a-f0-9]+: d5 \(bad\)
> > +[ ]*[a-f0-9]+: d5 rex2
> > [ ]*[a-f0-9]+: 02 .byte 0x2
> >
> > 0+5 <aam0>:
>
> These expectations match the ones of the same test in the parent directory.
> Hence instead of adjusting each in both places, please have the ones here
> reference the parent directory files.
>
They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.
# All the followings are illegal opcodes for x86-64.
aad0:
aad
aad1:
aad $2
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
>
> As before I'll look at the disassembler changes separately. This patch is simply
> too big.
>
Ok
> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> > return elem_size;
> > }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > + const char *cpu_flags)
> > +{
> > + /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD. */
> > + if (strcmp (cpu_flags, "XSAVES") >= 0
> > + || strcmp (cpu_flags, "XSAVEC") >= 0
> > + || strcmp (cpu_flags, "Xsave") >= 0
> > + || strcmp (cpu_flags, "Xsaveopt") >= 0
>
> Upon further thought for these (and maybe even ...
>
> > + || !strcmp (cpu_flags, "3dnow")
> > + || !strcmp (cpu_flags, "3dnowA"))
>
> ... for these, but see also below) it might be better to add the attribute right in
> the opcode table.
>
> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
> to be used like that would appear only consistent. Otherwise, if we were
> concerned of AMD extensions in general, SSE4a insns (and maybe further
> ones) would also need excluding. (Additionally recall that there's an overlap
> between 3dnowa and SSE, which would result in another [apparent]
> inconsistency when excluding 3dnow insns here.)
>
I see, for example I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }
Thanks,
Lili.
next prev parent reply other threads:[~2023-11-06 15:20 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05 ` Jan Beulich
2023-11-03 6:20 ` Cui, Lili
2023-11-03 13:05 ` Jan Beulich
2023-11-03 14:19 ` Jan Beulich
2023-11-06 15:20 ` Cui, Lili [this message]
2023-11-06 16:08 ` Jan Beulich
2023-11-07 8:16 ` Cui, Lili
2023-11-07 10:43 ` Jan Beulich
2023-11-07 15:31 ` Cui, Lili
2023-11-07 15:43 ` Jan Beulich
2023-11-07 15:53 ` Cui, Lili
2023-11-06 15:02 ` Jan Beulich
2023-11-07 8:06 ` Cui, Lili
2023-11-07 10:20 ` Jan Beulich
2023-11-07 14:32 ` Cui, Lili
2023-11-07 15:08 ` Jan Beulich
2023-11-06 15:39 ` Jan Beulich
2023-11-09 8:02 ` Cui, Lili
2023-11-09 10:52 ` Jan Beulich
2023-11-09 13:27 ` Cui, Lili
2023-11-09 15:22 ` Jan Beulich
2023-11-10 7:11 ` Cui, Lili
2023-11-10 9:14 ` Jan Beulich
2023-11-10 9:21 ` Jan Beulich
2023-11-10 12:38 ` Cui, Lili
2023-12-14 10:13 ` Cui, Lili
2023-12-18 15:24 ` Jan Beulich
2023-12-18 16:23 ` H.J. Lu
2023-11-10 9:47 ` Cui, Lili
2023-11-10 9:57 ` Jan Beulich
2023-11-10 12:05 ` Cui, Lili
2023-11-10 12:35 ` Jan Beulich
2023-11-13 0:18 ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08 9:11 ` Jan Beulich
2023-11-15 14:56 ` Cui, Lili
2023-11-16 9:17 ` Jan Beulich
2023-11-16 15:34 ` Cui, Lili
2023-11-16 16:50 ` Jan Beulich
2023-11-17 12:42 ` Cui, Lili
2023-11-17 14:38 ` Jan Beulich
2023-11-22 13:40 ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39 ` Jan Beulich
2023-11-20 1:19 ` Cui, Lili
2023-11-08 11:13 ` Jan Beulich
2023-11-20 12:36 ` Cui, Lili
2023-11-20 16:33 ` Jan Beulich
2023-11-22 7:46 ` Cui, Lili
2023-11-22 8:47 ` Jan Beulich
2023-11-22 10:45 ` Cui, Lili
2023-11-23 10:57 ` Jan Beulich
2023-11-23 12:14 ` Cui, Lili
2023-11-24 6:56 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07 8:17 ` Cui, Lili
2023-12-07 8:33 ` Cui, Lili
2023-11-09 9:37 ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20 1:33 ` Cui, Lili
2023-11-20 8:19 ` Jan Beulich
2023-11-20 12:54 ` Cui, Lili
2023-11-20 16:43 ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44 ` Jan Beulich
2023-11-08 12:52 ` Jan Beulich
2023-11-22 5:48 ` Cui, Lili
2023-11-22 8:53 ` Jan Beulich
2023-11-22 12:26 ` Cui, Lili
2023-11-09 9:57 ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36 ` Jan Beulich
2023-11-10 5:43 ` Hu, Lin1
2023-11-10 9:54 ` Jan Beulich
2023-11-14 2:28 ` Hu, Lin1
2023-11-14 10:50 ` Jan Beulich
2023-11-15 2:52 ` Hu, Lin1
2023-11-15 8:57 ` Jan Beulich
2023-11-15 2:59 ` [PATCH][v3] " Hu, Lin1
2023-11-15 9:34 ` Jan Beulich
2023-11-17 7:24 ` Hu, Lin1
2023-11-17 9:47 ` Jan Beulich
2023-11-20 3:28 ` Hu, Lin1
2023-11-20 8:34 ` Jan Beulich
2023-11-14 2:58 ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20 ` Jan Beulich
2023-11-15 1:49 ` Hu, Lin1
2023-11-15 8:52 ` Jan Beulich
2023-11-17 3:27 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59 ` Jan Beulich
2023-11-14 3:26 ` Hu, Lin1
2023-11-14 11:15 ` Jan Beulich
2023-11-24 5:40 ` Hu, Lin1
2023-11-24 7:21 ` Jan Beulich
2023-11-27 2:16 ` Hu, Lin1
2023-11-27 8:03 ` Jan Beulich
2023-11-27 8:46 ` Hu, Lin1
2023-11-27 8:54 ` Jan Beulich
2023-11-27 9:03 ` Hu, Lin1
2023-11-27 10:32 ` Jan Beulich
2023-12-04 7:33 ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42 ` Cui, Lili
2023-11-06 7:30 ` Jan Beulich
2023-11-06 14:20 ` Cui, Lili
2023-11-06 14:44 ` Jan Beulich
2023-11-06 16:03 ` Cui, Lili
2023-11-06 16:10 ` Jan Beulich
2023-11-07 1:53 ` Cui, Lili
2023-11-07 10:11 ` Jan Beulich
-- strict thread matches above, loose matches on Subject: below --
2023-09-19 15:25 [PATCH 0/8] [RFC] " Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27 ` Jan Beulich
2023-09-27 15:57 ` Cui, Lili
2023-09-21 15:51 ` Jan Beulich
2023-09-27 15:59 ` Cui, Lili
2023-09-28 8:02 ` Jan Beulich
2023-10-07 3:27 ` Cui, Lili
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB560042CC079FE35596CD34E49EAAA@SJ0PR11MB5600.namprd11.prod.outlook.com \
--to=lili.cui@intel.com \
--cc=JBeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=ccoutant@gmail.com \
--cc=hongjiu.lu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).