From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "hjl.tools@gmail.com" <hjl.tools@gmail.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions
Date: Tue, 20 Jul 2021 15:02:54 +0200 [thread overview]
Message-ID: <0db58f35-ebe7-5d80-d239-b4371ddf18fb@suse.com> (raw)
In-Reply-To: <BY5PR11MB400880A49AA4C0B2BE22BBFB9EE29@BY5PR11MB4008.namprd11.prod.outlook.com>
On 20.07.2021 13:26, Cui, Lili wrote:
>
>
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, July 20, 2021 4:46 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
>> instructions
>>
>> On 20.07.2021 09:08, Cui, Lili wrote:
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, July 14, 2021 11:21 PM
>>>> To: Cui, Lili <lili.cui@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
>>>> instructions
>>>>
>>>> On 13.07.2021 08:58, Cui, Lili wrote:
>>>>
>>>> Disassembler:
>>>>
>>>> d_scalar_mode looks to be unused.
>>>>
>>>> This
>>>>
>>>> /* EVEX_W_MAP5_2A_P_1 */
>>>> {
>>>> { "vcvtsi2sh{%LQ|}", { XMScalar, VexScalar, EXxEVexR, Ed }, 0 },
>>>> { "vcvtsi2sh{%LQ|}", { XMScalar, VexScalar, EXxEVexR, Eq }, 0 },
>>>> },
>>>>
>>>> can imo be expressed without decoding EVEX.W, by using Edq instead of
>>>> (separately) Ed and Eq. There's at least one similar case elsewhere.
>>>> Interestingly in the 2si/2usi conversions you do use Gdq already,
>>>> which I think handles the EVEX.W=1 case correctly outside of 64-bit
>>>> mode (unlike Eq, which will unconditionally produce 64-bit register names
>> afaict).
>>>>
>>>> As to a broader question on decoding EVEX.W: Did you consider
>>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a
>>>> valid encoding), to avoid this decode step for perhaps almost all
>>>> entries? And if that's not an option, decoding EVEX.W first for all
>>>> the opcodes which previously had no meaning at all would, in some
>>>> cases, reduce the overall number of table entries (and in all other
>>>> cases this would then merely be for consistency, as it also wouldn't
>> increase the number of table entries). To give an example:
>>>>
>>>> { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
>>>>
>>>> =>
>>>>
>>>> /* PREFIX_EVEX_0F3AC2 */
>>>> {
>>>> { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },
>>>> { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },
>>>> },
>>>>
>>>> =>
>>>>
>>>> /* EVEX_W_0F3AC2_P_0 */
>>>> {
>>>> { "vcmpph", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>>>> },
>>>> /* EVEX_W_0F3AC2_P_1 */
>>>> {
>>>> { "vcmpsh", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>>>> },
>>>>
>>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would
>>>> yield 1
>>>> (evex) + 2 (evex_w) + 4 (prefix) entries.
>>>
>>> Hi Jan,
>>>
>>> Do you want me to change it like this?
>>> { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
>>>
>>> =>
>>>
>>> /* PREFIX_EVEX_0F3AC2 */
>>> {
>>> { "vcmp%XH", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>>> { "vcmp%XH", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>>> },
>>>
>>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report
>> bad code.
>>> if (EVEX.LL== EVEX.LLIG)
>>> print 'sh'
>>> else
>>> print 'ph'
>>
>> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'
>> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an
>> appropriate indication of the encoding being bad if EVEX.W is set.
>> IOW something like
>>
>> /* PREFIX_EVEX_0F3AC2 */
>> {
>> { "vcmpp%XH", { XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>> { "vcmps%XH", { XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>> },
>>
>>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2
>>>> vs. 1 + 2 + 4. This also results in more related entries ending up
>>>> closer to one another.
>>>>
>>> I don't quite understand here, should I let all FP16 disassembler go
>> through W_TABLE fist? or just add something like %XH instead of going
>> through W_TABLE? Thanks.
>>
>> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per
>> above, you can avoid that decoding step altogether by using %XH.
>>
> I prefer to decode EVEX.W first instead of using %XH.
Well, I can't stop you from avoiding %XH, but I did intentionally say
"Where beneficial you will want to ...". That is, I think that where
possible you should use %XH, and only where that's not suitable decode
EVEX.W (typically earlier than EVEX.pp).
Jan
next prev parent reply other threads:[~2021-07-20 13:02 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 7:47 [PATCH 0/2]Enable Intel AVX512_FP16 instructions and add tests for it Cui,Lili
2021-07-01 7:47 ` [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions Cui,Lili
2021-07-02 13:42 ` Jan Beulich
2021-07-02 15:46 ` Jan Beulich
2021-07-06 12:42 ` Cui, Lili
2021-07-09 11:52 ` Cui, Lili
2021-07-13 7:25 ` Jan Beulich
2021-07-13 7:35 ` Cui, Lili
2021-07-02 15:08 ` Jan Beulich
2021-07-09 11:50 ` Cui, Lili
2021-07-05 6:30 ` Jan Beulich
2021-07-05 12:38 ` H.J. Lu
2021-07-06 12:48 ` Cui, Lili
2021-07-09 11:47 ` Cui, Lili
2021-07-09 12:16 ` Jan Beulich
2021-07-13 6:58 ` Cui, Lili
2021-07-13 7:54 ` Jan Beulich
2021-07-13 8:03 ` Cui, Lili
2021-07-13 16:25 ` Jan Beulich
[not found] ` <DM6PR11MB4009305D09B37299FC2F282C9EE39@DM6PR11MB4009.namprd11.prod.outlook.com>
2021-07-21 14:29 ` Jan Beulich
2021-07-22 7:05 ` Cui, Lili
2021-07-14 15:21 ` Jan Beulich
2021-07-20 7:08 ` FW: " Cui, Lili
2021-07-20 8:46 ` Jan Beulich
2021-07-20 11:13 ` Cui, Lili
2021-07-20 11:26 ` Cui, Lili
2021-07-20 13:02 ` Jan Beulich [this message]
2021-07-20 13:38 ` Cui, Lili
2021-07-20 14:15 ` Jan Beulich
2021-07-20 14:29 ` Cui, Lili
2021-07-21 10:32 ` Jan Beulich
2021-07-01 15:42 ` [PATCH 0/2]Enable Intel AVX512_FP16 instructions and add tests for it H.J. Lu
2021-07-01 17:46 ` H.J. Lu
2021-07-02 0:13 ` Cui, Lili
[not found] ` <20210701074736.9534-3-lili.cui@intel.com>
2021-07-02 15:44 ` [PATCH 2/2] [PATCH 2/2] Add tests for Intel AVX512_FP16 instructions Jan Beulich
[not found] ` <BY5PR11MB4008FDC77679D0F8FB9E88B39E149@BY5PR11MB4008.namprd11.prod.outlook.com>
2021-07-13 15:59 ` Jan Beulich
2021-07-14 18:01 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0db58f35-ebe7-5d80-d239-b4371ddf18fb@suse.com \
--to=jbeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hjl.tools@gmail.com \
--cc=lili.cui@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).