From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "hjl.tools@gmail.com" <hjl.tools@gmail.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 2/3] x86: Drop SwapSources
Date: Tue, 30 Apr 2024 02:56:18 +0000 [thread overview]
Message-ID: <SJ0PR11MB5600B46E5A5F064BC4BFF7D39E1A2@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <787a00d3-7852-4c7d-ad03-bfa0ecc5a8bd@suse.com>
> On 29.04.2024 15:41, Cui, Lili wrote:
> >> On 29.04.2024 14:23, Cui, Lili wrote:
> >>>> On 28.04.2024 06:47, Cui, Lili wrote:
> >>>>>> On 26.04.2024 10:14, Cui, Lili wrote:
> >>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
> >>>>>>>>> --- a/gas/config/tc-i386.c
> >>>>>>>>> +++ b/gas/config/tc-i386.c
> >>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
> >>>>>>>>>
> >>>>>>>>> switch (i.tm.opcode_modifier.vexvvvv)
> >>>>>>>>> {
> >>>>>>>>> + case VexVVVV_SRC2:
> >>>>>>>>> + if (source != op)
> >>>>>>>>> + {
> >>>>>>>>> + v = source++;
> >>>>>>>>> + break;
> >>>>>>>>> + }
> >>>>>>>>> + /* For XOP: vpshl* and vpsha*. */
> >>>>>>>>> + /* Fall through. */
> >>>>>>>>> case VexVVVV_SRC1:
> >>>>>>>>
> >>>>>>>> This falling-through is odd and hence needs a better comment
> >>>>>>>> (then also covering vprot*, which afaict is similarly affected).
> >>>>>>>> The reason for this is the XOP.W-controlled operand swapping,
> >>>>>>>> if I'm not mistaken? In which case perhaps instead of the
> >>>>>>>> fall-through here the logic swapping the operands should
> >>>>>>>> replace
> >>>>>>>> VexVVVV_SRC2 by
> >>>>>> VexVVVV_SRC1?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, vprot* should be included, and it is related to
> >>>>>>> XOP.W-controlled
> >>>>>> operand swapping, the comments says " /* Only the first two
> >>>>>> register operands need reversing, alongside flipping VEX.W. */
> >>>>>> ", But there is actually a memory operand, not two register operands.
> >>>>>>>
> >>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the
> >>>>>>> actual
> >>>>>> situation, we want to use vvvv to encode the first operand.
> >>>>>>>
> >>>>>>> Opcode table:
> >>>>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
> >>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
> >>>> { RegXMM,
> >>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
> >>>>>>>
> >>>>>>> testcase:
> >>>>>>> vprotb (%rax),%xmm12,%xmm15
> >>>>>>> vprotb %xmm15,(%r12),%xmm0
> >>>>>>
> >>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the
> >>>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to
> >>>>>> replace the attribute when swapping operands.
> >>>>>>
> >>>>>
> >>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping
> >>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we
> still
> >>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
> >>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I
> made
> >>>> a simple modification and it still failed, I think continuing like
> >>>> this may go against the original intention).
> >>>>>
> >>>>> switch (i.tm.opcode_modifier.vexvvvv)
> >>>>> {
> >>>>> /* VEX.vvvv encodes the first source register operand. */
> >>>>> case VexVVVV_SRC1:
> >>>>> if (!is_cpu (&i.tm, CpuXOP) || source == op)
> >>>>> {
> >>>>> v = dest - 1;
> >>>>> break;
> >>>>> }
> >>>>> /* For XOP: vpshl*, vpsha* and vprot*. */
> >>>>> /* Fall through. */
> >>>>> /* VEX.vvvv encodes the last source register operand. */
> >>>>> case VexVVVV_SRC2:
> >>>>> v = source++;
> >>>>> break;
> >>>>> /* VEX.vvvv encodes the destination register operand. */
> >>>>> case VexVVVV_DST:
> >>>>> v = dest--;
> >>>>> break;
> >>>>> default:
> >>>>> v = ~0;
> >>>>> break;
> >>>>> }
> >>>>>
> >>>>> Do you think we should add a separate patch 4 for XOP that removes
> >>>>> the
> >>>> special handling in match_template and completes its template? so
> >>>> we don't have to add special handling for src1vvvv or src2vvvv.
> >>>> This might go against your desire to reduce template size, but it
> >>>> would help simplify the logic. I'd like to know your thoughts.
> >>>>
> >>>> Indeed. You'd effectively revert earlier folding that I did. And
> >>>> the adjustment I suggested earlier ought to be small/simple enough.
> >>>>
> >>>
> >>> So, I continued working on the previous suggestion. With the
> >>> following
> >> modification and it worked.
> >>>
> >>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
> >>> || is_cpu (t, CpuAPX_F));
> >>> if (!operand_type_match (overlap0, i.types[0])
> >>> || !operand_type_match (overlap1, i.types[j])
> >>> - || (t->operands == 3
> >>> + || (t->operands == 3 && !is_cpu (t, CpuXOP)
> >>> && !operand_type_match (overlap2, i.types[1]))
> >>> || (check_register
> >>> && !operand_type_register_match (i.types[0],
> >>
> >> Just to mention it - this certainly isn't what I suggested. In fact I
> >> seem to vaguely recall that something similar was once proposed
> >> during the original APX work as well, where I then objected, too.
> >>
> >>> But I found that there are 4 test files that failed, I didn't find
> >>> the doc on
> >> how to encode vprotb but I guess that is because I changed the
> >> default template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all
> >> the related test cases needed to be modified. Do you have any comments
> here?
> >>>
> >>> regexp "^[ ]*[a-f0-9]+: 8f e9 40 90
> >> d8[ ]+vprotb %xmm7,%xmm0,%xmm3$"
> >>> line " 11ed: 8f e9 f8 90 df vprotb %xmm7,%xmm0,%xmm3"
> >>
> >> Well, while changing the templates is in principle possible, and the
> >> resulting code would still be correct, changing encodings it usually not a
> good idea.
> >> Thus when it can be avoided, it should be avoided, imo. Hence why I
> >> didn't suggest this, but to amend the code doing the operand swapping
> >> (for the case where operand order is controlled by XOP.W).
> >>
> >
> > I mistakenly thought this was what you wanted, and I also think this
> modification is unacceptable. Could you elaborate further on your original
> suggestion?
>
> At the bottom of match_template() we have
>
> case Opcode_VexW:
> /* Only the first two register operands need reversing, alongside
> flipping VEX.W. */
> i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
>
> This is where I think you further want to adjust
> i.tm.opcode_modifier.vexvvvv.
>
"vprotb %xmm7,%xmm0,%xmm3" doesn't go through this place, it finds the matching template directly (Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }). I inserted a judgment to solve this problem, it's a bit ugly. I'd like to abandon this optimization.
To avoid confusion, I post all the changes.
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
|| is_cpu (t, CpuAPX_F));
if (!operand_type_match (overlap0, i.types[0])
|| !operand_type_match (overlap1, i.types[j])
- || (t->operands == 3
+ || (t->operands == 3 && !is_cpu (t, CpuXOP) --> This place also needs to change.
&& !operand_type_match (overlap2, i.types[1]))
|| (check_register
&& !operand_type_register_match (i.types[0],
@@ -9035,6 +9035,10 @@ match_template (char mnem_suffix)
specific_error = progress (i.error);
continue;
}
+ if (is_cpu (t, CpuXOP) && operand_types[0].bitfield.baseindex --> It's ugly, and still has some test cases failing for other XOP instructions.
+ && i.types[0].bitfield.class == RegSIMD
+ && t->opcode_modifier.vexw == VEXW1)
+ found_reverse_match = Opcode_VexW;
break;
}
}
@@ -10434,19 +10434,21 @@ build_modrm_byte (void)
switch (i.tm.opcode_modifier.vexvvvv)
{
- /* VEX.vvvv encodes the first source register operand. */
- case VexVVVV_SRC2:
- if (source != op)
+ /* VEX.vvvv encodes the first source register operand. */
+ case VexVVVV_SRC1:
+ if (!is_cpu (&i.tm, CpuXOP) || op == 0) --> changed src1vvvv to src2vvvv of " vprotb %xmm7,%xmm0,%xmm3".
{
- v = source++;
+ v = dest - 1;
break;
}
- /* For XOP: vpshl* and vpsha*. */
+ /* For XOP: vpshl*, vpsha* and vprot*. */
/* Fall through. */
- case VexVVVV_SRC1:
- v = dest - 1;
+ /* VEX.vvvv encodes the last source register operand. */
+ case VexVVVV_SRC2:
+ v = source++;
break;
- /* VEX.vvvv encodes the destination register operand. */
+
+ /* VEX.vvvv encodes the destination register operand. */
case VexVVVV_DST:
v = dest--;
break;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 5d6f1c74449..5eef1cabd8c 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -1938,10 +1938,10 @@ vpmacsww, 0x95, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, Reg
vpmadcsswd, 0xa6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
vpmadcswd, 0xb6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
vpperm, 0xa3, XOP, D|Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
-vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
+vprot<xop>, 0x90 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
vprot<xop>, 0xc0 | <xop:opc>, XOP, Modrm|Vex128|SpaceXOP08|VexW0|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
-vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
-vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM }
+vpsha<xop>, 0x98 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
+vpshl<xop>, 0x94 | <xop:opc>, XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }
Thanks,
Lili.
next prev parent reply other threads:[~2024-04-30 2:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-24 7:23 [PATCH 0/3] x86: Optimize the encoder of the vvvv register Cui, Lili
2024-04-24 7:23 ` [PATCH 1/3] x86: Use vexvvvv to encode " Cui, Lili
2024-04-24 7:52 ` Jan Beulich
2024-04-25 13:14 ` Cui, Lili
2024-04-25 13:22 ` Jan Beulich
2024-04-26 5:33 ` Cui, Lili
2024-04-26 6:52 ` Jan Beulich
2024-04-24 7:23 ` [PATCH 2/3] x86: Drop SwapSources Cui, Lili
2024-04-24 8:07 ` Jan Beulich
2024-04-26 8:14 ` Cui, Lili
2024-04-26 10:37 ` Jan Beulich
2024-04-28 4:47 ` Cui, Lili
2024-04-29 6:40 ` Jan Beulich
2024-04-29 12:23 ` Cui, Lili
2024-04-29 13:08 ` Jan Beulich
2024-04-29 13:41 ` Cui, Lili
2024-04-29 13:49 ` Jan Beulich
2024-04-30 2:56 ` Cui, Lili [this message]
2024-04-30 6:18 ` Jan Beulich
2024-04-30 7:34 ` Cui, Lili
2024-04-30 9:22 ` Jan Beulich
2024-04-24 7:23 ` [PATCH 3/3] x86: Drop using extension_opcode to encode vvvv register Cui, Lili
2024-04-24 8:19 ` Jan Beulich
2024-04-24 8:27 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB5600B46E5A5F064BC4BFF7D39E1A2@SJ0PR11MB5600.namprd11.prod.outlook.com \
--to=lili.cui@intel.com \
--cc=JBeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hjl.tools@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).