RE: [PATCH 1/2] Support Intel MOVRS

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Jiang, Haochen" <haochen.jiang@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>, "Hu, Lin1" <lin1.hu@intel.com>
Cc: "hjl.tools@gmail.com" <hjl.tools@gmail.com>,
	"Cui, Lili" <lili.cui@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 1/2] Support Intel MOVRS
Date: Mon, 30 Dec 2024 07:39:20 +0000	[thread overview]
Message-ID: <SA1PR11MB5946120E91F78673E2E8D45CEC092@SA1PR11MB5946.namprd11.prod.outlook.com> (raw)
In-Reply-To: <a7b90d82-841b-4901-bf1c-1913c8ed598c@suse.com>

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, December 27, 2024 8:27 PM
> 
> On 24.12.2024 10:24, Haochen Jiang wrote:
> > From: "Hu, Lin1" <lin1.hu@intel.com>
> >
> > The APX_F extension for MOVRS will be:
> >   - EVEX.LLZ.NP.MAP4.SCALABLE 8A !(11):rrr:bbb for r8/m8 with NF=0 and
> >     ND=0
> 
> This got to be IGNORED, not SCALABLE, to match other byte insns in APX?

It should not be SCALABLE, I copied it wrongly. But it is missing IGNORED.
I suppose it should be added. Let me check that.

> >
> > We did not merge the table together for APX_F since there is an
> > explicit
> > x64 for movrs insn. The current APX_F() did not support the
> > combination between CPUIDs.
> 
> This is an odd remark to make. APX_F() makes sense to use only for dual
> VEX/EVEX templates, yet the non-APX forms of MOVRS are legacy-encoded.
> See also the ENQCMD{,S} cleanup patch just sent.

I just got deeper understanding on APX_F() through that patch, I will remove this.

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/movrs-inval.s
> > @@ -0,0 +1,12 @@
> > +# Check Illegal 32bit MOVRS instructions
> > +
> > +	.text
> > +_start:
> > +	movrs	0x10000000(%esp, %esi, 8), %dx
> > +	movrs	0x10000000(%esp, %esi, 8), %edx
> > +	movrs	0x10000000(%esp, %esi, 8), %r12
> > +	movrs	0x10000000(%esp, %esi, 8), %bl
> > +	vmovrsb	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsd	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsq	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> > +	vmovrsw	0x10000000(%esp, %esi, 8), %zmm6{%k7}
> 
> Once again I'm uncertain of the usefulness of this test.

I did not remove this since MOVRS got prefetchrst2 which is valid
under 32-bit. It is not a whole CPUID under 64-bit.

I am okay for keeping it or removing it.

> 
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
> > @@ -118,6 +118,10 @@ Disassembly of section \.text:
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c .d 08 f8 bc 87 23 01 00 00[
> 	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[
> 	 ]+%r25d,0x123\(%r31,%rax,4\)
> >  [	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[
> 	 ]+%r31,0x123\(%r31,%rax,4\)
> > +[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 8a 84 87 23 01 00 00[	 ]+movrs[
> 	 ]+0x123\(%r31,%rax,4\),%r16b
> 
> This one's WIG, but doesn't end up with W=1. I wonder whether that's related
> to the very last comment (on the opcode table entries).

Initially MOVRS is implemented under W0, which is wrong. I did a last minute
change to let it skip W. However, APX_F part is forgotten. I will handle that part.

> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.d
> > @@ -0,0 +1,13 @@
> > +#objdump: -dwMsuffix
> > +#name: x86_64 MOVRS insns
> > +
> > +.*: +file format .*
> > +
> > +Disassembly of section \.text:
> > +
> > +0+ <_start>:
> > +\s*[a-f0-9]+:\s*66 0f 38 8b 92 00 ff ff ff\s+movrsw
> > +-0x100\(%rdx\),%dx \s*[a-f0-9]+:\s*0f 38 8b 92 00 fe ff ff\s+movrsl
> > +-0x200\(%rdx\),%edx \s*[a-f0-9]+:\s*4c 0f 38 8b a2 00 fc ff
> > +ff\s+movrsq -0x400\(%rdx\),%r12 \s*[a-f0-9]+:\s*0f 38 8a 5a
> > +80\s+movrsb -0x80\(%rdx\),%bl #pass
> 
> While I agree with having a separate -Msuffix test, ...
> 
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-movrs-suffix.s
> > @@ -0,0 +1,15 @@
> > +# Check 64bit MOVRS instructions
> > +
> > +	.text
> > +_start:
> > +	movrsw	-256(%rdx), %dx
> > +	movrsl	-512(%rdx), %edx
> > +	movrsq	-1024(%rdx), %r12
> > +	movrsb	-128(%rdx), %bl
> 
> ... the source imo wants folding into x86-64-movrs.s and ...
> 
> > +_intel:
> > +	.intel_syntax noprefix
> > +	movrsw	dx, WORD PTR [rdx-256]
> > +	movrsl	edx, DWORD PTR [rdx-512]
> > +	movrsq	r12, QWORD PTR [rdx-1024]
> > +	movrsb	bl, BYTE PTR [rdx-128]
> 
> ... further bogus use of suffixes in Intel syntax shouldn't be added in testcases.

I will change them.

> 
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -3554,3 +3554,17 @@ vcomxs<sdh>, 0x<sdh:spfx>2f, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8M
> >  vucomxs<sdh>, 0x<sdh:spfx>2e, AVX10_2,
> > Modrm|EVexLIG|<sdh:spc1>|<sdh:vexw>|Disp8MemShift|NoSuf|SAE, {
> > RegXMM|<sdh:elem>|Unspecified|BaseIndex, RegXMM }
> >
> >  // AVX10.2 instructions end.
> > +
> > +// MOVRS instructions.
> > +
> > +prefetchrst2, 0xf18/4, MOVRS, Modrm|Anysize|IgnoreSize|NoSuf, {
> > +BaseIndex }
> > +
> > +movrs, 0x8a, MOVRS&x64,
> > +Modrm|Space0F38|No_wSuf|No_lSuf|No_sSuf|No_qSuf, {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&x64,
> > +Modrm|Anysize|Space0F38|CheckOperandSize|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> movrs,
> > +0x8a, MOVRS&APX_F,
> >
> +Modrm|EVex128|EVexMap4|VexW0|No_wSuf|No_lSuf|No_sSuf|No_qSuf
> , {
> > +Byte|Unspecified|BaseIndex, Reg8 } movrs, 0x8b, MOVRS&APX_F,
> > +Modrm|CheckOperandSize|EVex128|EVexMap4|No_bSuf|No_sSuf, {
> > +Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> 
> These want to leverage W if at all possible, allowing to go from 4 to 2
> templates.

It is not a VEX/EVEX promotion, but a legacy/EVEX promotion. Could we do
that?

If we could, the problem here is still MOVRS is not 64-bit default. Then the
non-APX part needs x64, making it multiple CPUIDs. Maybe it also needs
work around in cpu_flags_match second assert just like
AMX_MOVRS & AMX_TRANSPOSE. At least we could check the all.bitfield
to let them skip that assert.

Thx,
Haochen

next prev parent reply	other threads:[~2024-12-30  7:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-24  9:24 [PATCH 0/2] Support Intel MOVRS related insns Haochen Jiang
2024-12-24  9:24 ` [PATCH 1/2] Support Intel MOVRS Haochen Jiang
2024-12-27 12:26   ` Jan Beulich
2024-12-30  7:39     ` Jiang, Haochen [this message]
2024-12-30  7:59       ` Jiang, Haochen
2025-01-06  8:38       ` Jan Beulich
2025-01-07  2:57         ` Jiang, Haochen
2025-01-07  8:55           ` Jan Beulich
2025-01-07 10:43             ` Jiang, Haochen
2024-12-24  9:24 ` [PATCH v2 2/2] Support Intel AMX-MOVRS Haochen Jiang
2024-12-27 12:47   ` Jan Beulich
2024-12-30  3:25     ` Jiang, Haochen
2025-01-07  3:10       ` Jiang, Haochen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SA1PR11MB5946120E91F78673E2E8D45CEC092@SA1PR11MB5946.namprd11.prod.outlook.com \
    --to=haochen.jiang@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hjl.tools@gmail.com \
    --cc=lili.cui@intel.com \
    --cc=lin1.hu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).