From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "hjl.tools@gmail.com" <hjl.tools@gmail.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: [PATCH 2/3] x86: Drop SwapSources
Date: Mon, 29 Apr 2024 15:08:55 +0200 [thread overview]
Message-ID: <de515e88-0b33-4678-a47f-ca394de3886c@suse.com> (raw)
In-Reply-To: <SJ0PR11MB5600990BB714D915F249D4B79E1B2@SJ0PR11MB5600.namprd11.prod.outlook.com>
On 29.04.2024 14:23, Cui, Lili wrote:
>> On 28.04.2024 06:47, Cui, Lili wrote:
>>>> On 26.04.2024 10:14, Cui, Lili wrote:
>>>>>> On 24.04.2024 09:23, Cui, Lili wrote:
>>>>>>> --- a/gas/config/tc-i386.c
>>>>>>> +++ b/gas/config/tc-i386.c
>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void)
>>>>>>>
>>>>>>> switch (i.tm.opcode_modifier.vexvvvv)
>>>>>>> {
>>>>>>> + case VexVVVV_SRC2:
>>>>>>> + if (source != op)
>>>>>>> + {
>>>>>>> + v = source++;
>>>>>>> + break;
>>>>>>> + }
>>>>>>> + /* For XOP: vpshl* and vpsha*. */
>>>>>>> + /* Fall through. */
>>>>>>> case VexVVVV_SRC1:
>>>>>>
>>>>>> This falling-through is odd and hence needs a better comment (then
>>>>>> also covering vprot*, which afaict is similarly affected). The
>>>>>> reason for this is the XOP.W-controlled operand swapping, if I'm
>>>>>> not mistaken? In which case perhaps instead of the fall-through
>>>>>> here the logic swapping the operands should replace VexVVVV_SRC2 by
>>>> VexVVVV_SRC1?
>>>>>>
>>>>>
>>>>> Yes, vprot* should be included, and it is related to
>>>>> XOP.W-controlled
>>>> operand swapping, the comments says " /* Only the first two register
>>>> operands need reversing, alongside flipping VEX.W. */ ", But there
>>>> is actually a memory operand, not two register operands.
>>>>>
>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the actual
>>>> situation, we want to use vvvv to encode the first operand.
>>>>>
>>>>> Opcode table:
>>>>> vprot<xop>, 0x90 | <xop:opc>, XOP,
>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf,
>> { RegXMM,
>>>>> RegXMM|Unspecified|BaseIndex, RegXMM }
>>>>>
>>>>> testcase:
>>>>> vprotb (%rax),%xmm12,%xmm15
>>>>> vprotb %xmm15,(%r12),%xmm0
>>>>
>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the
>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to replace
>>>> the attribute when swapping operands.
>>>>
>>>
>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping
>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we still
>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under
>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I made a
>> simple modification and it still failed, I think continuing like this may go
>> against the original intention).
>>>
>>> switch (i.tm.opcode_modifier.vexvvvv)
>>> {
>>> /* VEX.vvvv encodes the first source register operand. */
>>> case VexVVVV_SRC1:
>>> if (!is_cpu (&i.tm, CpuXOP) || source == op)
>>> {
>>> v = dest - 1;
>>> break;
>>> }
>>> /* For XOP: vpshl*, vpsha* and vprot*. */
>>> /* Fall through. */
>>> /* VEX.vvvv encodes the last source register operand. */
>>> case VexVVVV_SRC2:
>>> v = source++;
>>> break;
>>> /* VEX.vvvv encodes the destination register operand. */
>>> case VexVVVV_DST:
>>> v = dest--;
>>> break;
>>> default:
>>> v = ~0;
>>> break;
>>> }
>>>
>>> Do you think we should add a separate patch 4 for XOP that removes the
>> special handling in match_template and completes its template? so we don't
>> have to add special handling for src1vvvv or src2vvvv. This might go against
>> your desire to reduce template size, but it would help simplify the logic. I'd
>> like to know your thoughts.
>>
>> Indeed. You'd effectively revert earlier folding that I did. And the adjustment I
>> suggested earlier ought to be small/simple enough.
>>
>
> So, I continued working on the previous suggestion. With the following modification and it worked.
>
> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix)
> || is_cpu (t, CpuAPX_F));
> if (!operand_type_match (overlap0, i.types[0])
> || !operand_type_match (overlap1, i.types[j])
> - || (t->operands == 3
> + || (t->operands == 3 && !is_cpu (t, CpuXOP)
> && !operand_type_match (overlap2, i.types[1]))
> || (check_register
> && !operand_type_register_match (i.types[0],
Just to mention it - this certainly isn't what I suggested. In fact I seem to
vaguely recall that something similar was once proposed during the original
APX work as well, where I then objected, too.
> But I found that there are 4 test files that failed, I didn't find the doc on how to encode vprotb but I guess that is because I changed the default template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all the related test cases needed to be modified. Do you have any comments here?
>
> regexp "^[ ]*[a-f0-9]+: 8f e9 40 90 d8[ ]+vprotb %xmm7,%xmm0,%xmm3$"
> line " 11ed: 8f e9 f8 90 df vprotb %xmm7,%xmm0,%xmm3"
Well, while changing the templates is in principle possible, and the
resulting code would still be correct, changing encodings it usually not a
good idea. Thus when it can be avoided, it should be avoided, imo. Hence
why I didn't suggest this, but to amend the code doing the operand swapping
(for the case where operand order is controlled by XOP.W).
Jan
next prev parent reply other threads:[~2024-04-29 13:08 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-24 7:23 [PATCH 0/3] x86: Optimize the encoder of the vvvv register Cui, Lili
2024-04-24 7:23 ` [PATCH 1/3] x86: Use vexvvvv to encode " Cui, Lili
2024-04-24 7:52 ` Jan Beulich
2024-04-25 13:14 ` Cui, Lili
2024-04-25 13:22 ` Jan Beulich
2024-04-26 5:33 ` Cui, Lili
2024-04-26 6:52 ` Jan Beulich
2024-04-24 7:23 ` [PATCH 2/3] x86: Drop SwapSources Cui, Lili
2024-04-24 8:07 ` Jan Beulich
2024-04-26 8:14 ` Cui, Lili
2024-04-26 10:37 ` Jan Beulich
2024-04-28 4:47 ` Cui, Lili
2024-04-29 6:40 ` Jan Beulich
2024-04-29 12:23 ` Cui, Lili
2024-04-29 13:08 ` Jan Beulich [this message]
2024-04-29 13:41 ` Cui, Lili
2024-04-29 13:49 ` Jan Beulich
2024-04-30 2:56 ` Cui, Lili
2024-04-30 6:18 ` Jan Beulich
2024-04-30 7:34 ` Cui, Lili
2024-04-30 9:22 ` Jan Beulich
2024-04-24 7:23 ` [PATCH 3/3] x86: Drop using extension_opcode to encode vvvv register Cui, Lili
2024-04-24 8:19 ` Jan Beulich
2024-04-24 8:27 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de515e88-0b33-4678-a47f-ca394de3886c@suse.com \
--to=jbeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hjl.tools@gmail.com \
--cc=lili.cui@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).