Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "hjl.tools@gmail.com" <hjl.tools@gmail.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions
Date: Tue, 20 Jul 2021 15:02:54 +0200	[thread overview]
Message-ID: <0db58f35-ebe7-5d80-d239-b4371ddf18fb@suse.com> (raw)
In-Reply-To: <BY5PR11MB400880A49AA4C0B2BE22BBFB9EE29@BY5PR11MB4008.namprd11.prod.outlook.com>

On 20.07.2021 13:26, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, July 20, 2021 4:46 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: FW: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
>> instructions
>>
>> On 20.07.2021 09:08, Cui, Lili wrote:
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, July 14, 2021 11:21 PM
>>>> To: Cui, Lili <lili.cui@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16
>>>> instructions
>>>>
>>>> On 13.07.2021 08:58, Cui, Lili wrote:
>>>>
>>>> Disassembler:
>>>>
>>>> d_scalar_mode looks to be unused.
>>>>
>>>> This
>>>>
>>>>   /* EVEX_W_MAP5_2A_P_1 */
>>>>   {
>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Ed }, 0 },
>>>>     { "vcvtsi2sh{%LQ|}",	{ XMScalar, VexScalar, EXxEVexR, Eq }, 0 },
>>>>   },
>>>>
>>>> can imo be expressed without decoding EVEX.W, by using Edq instead of
>>>> (separately) Ed and Eq. There's at least one similar case elsewhere.
>>>> Interestingly in the 2si/2usi conversions you do use Gdq already,
>>>> which I think handles the EVEX.W=1 case correctly outside of 64-bit
>>>> mode (unlike Eq, which will unconditionally produce 64-bit register names
>> afaict).
>>>>
>>>> As to a broader question on decoding EVEX.W: Did you consider
>>>> introducing e.g. %XH (paralleling %XW, just that EVEX.W=1 is not a
>>>> valid encoding), to avoid this decode step for perhaps almost all
>>>> entries? And if that's not an option, decoding EVEX.W first for all
>>>> the opcodes which previously had no meaning at all would, in some
>>>> cases, reduce the overall number of table entries (and in all other
>>>> cases this would then merely be for consistency, as it also wouldn't
>> increase the number of table entries). To give an example:
>>>>
>>>>     { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
>>>>
>>>> =>
>>>>
>>>>   /* PREFIX_EVEX_0F3AC2 */
>>>>   {
>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_0) },
>>>>     { VEX_W_TABLE (EVEX_W_0F3AC2_P_1) },
>>>>   },
>>>>
>>>> =>
>>>>
>>>>   /* EVEX_W_0F3AC2_P_0 */
>>>>   {
>>>>     { "vcmpph",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>>>>   },
>>>>   /* EVEX_W_0F3AC2_P_1 */
>>>>   {
>>>>     { "vcmpsh",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>>>>   },
>>>>
>>>> i.e. a total of 1 + 4 + 2 * 2 entries. Whereas decoding W first would
>>>> yield 1
>>>> (evex) + 2 (evex_w) + 4 (prefix) entries.
>>>
>>> Hi Jan,
>>>
>>> Do you want me to change it like this?
>>>      { PREFIX_TABLE (PREFIX_EVEX_0F3AC2) },
>>>
>>>  =>
>>>
>>>    /* PREFIX_EVEX_0F3AC2 */
>>>    {
>>>      { "vcmp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>>>      { "vcmp%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>>>    },
>>>
>>> "XH" => print 'ph', 'sh' depending on the EVEX.ll bit, if EVEX.W==W1 report
>> bad code.
>>> if  (EVEX.LL== EVEX.LLIG)
>>>       print 'sh'
>>> else
>>>       print 'ph'
>>
>> Not exactly, no. %XH was meant to parallel %XW, which prints 's' or 'd'
>> depending on VEX.W. %XH would print 'h' if EVEX.W is clear and produce an
>> appropriate indication of the encoding being bad if EVEX.W is set.
>> IOW something like
>>
>>    /* PREFIX_EVEX_0F3AC2 */
>>    {
>>      { "vcmpp%XH",	{ XMask, Vex, EXxh, EXxEVexS, Ib }, 0 },
>>      { "vcmps%XH",	{ XMask, VexScalar, EXxmm_mw, EXxEVexS, Ib }, 0 },
>>    },
>>
>>>> The delta is even larger for something like MAP5_7D: 1 + 4 + 4 * 2
>>>> vs. 1 + 2 + 4. This also results in more related entries ending up
>>>> closer to one another.
>>>>
>>> I don't quite understand here,  should I let all FP16 disassembler go
>> through W_TABLE fist? or just add something like %XH instead of going
>> through W_TABLE? Thanks.
>>
>> Where beneficial you will want to decode EVEX.W first, yes. Unless, as per
>> above, you can avoid that decoding step altogether by using %XH.
>>
> I prefer to decode EVEX.W first instead of using %XH.

Well, I can't stop you from avoiding %XH, but I did intentionally say
"Where beneficial you will want to ...". That is, I think that where
possible you should use %XH, and only where that's not suitable decode
EVEX.W (typically earlier than EVEX.pp).

Jan

next prev parent reply	other threads:[~2021-07-20 13:02 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  7:47 [PATCH 0/2]Enable Intel AVX512_FP16 instructions and add tests for it Cui,Lili
2021-07-01  7:47 ` [PATCH 1/2] [PATCH 1/2] Enable Intel AVX512_FP16 instructions Cui,Lili
2021-07-02 13:42   ` Jan Beulich
2021-07-02 15:46     ` Jan Beulich
2021-07-06 12:42       ` Cui, Lili
2021-07-09 11:52     ` Cui, Lili
2021-07-13  7:25       ` Jan Beulich
2021-07-13  7:35         ` Cui, Lili
2021-07-02 15:08   ` Jan Beulich
2021-07-09 11:50     ` Cui, Lili
2021-07-05  6:30   ` Jan Beulich
2021-07-05 12:38     ` H.J. Lu
2021-07-06 12:48       ` Cui, Lili
2021-07-09 11:47     ` Cui, Lili
2021-07-09 12:16       ` Jan Beulich
2021-07-13  6:58         ` Cui, Lili
2021-07-13  7:54           ` Jan Beulich
2021-07-13  8:03             ` Cui, Lili
2021-07-13 16:25           ` Jan Beulich
     [not found]             ` <DM6PR11MB4009305D09B37299FC2F282C9EE39@DM6PR11MB4009.namprd11.prod.outlook.com>
2021-07-21 14:29               ` Jan Beulich
2021-07-22  7:05                 ` Cui, Lili
2021-07-14 15:21           ` Jan Beulich
2021-07-20  7:08             ` FW: " Cui, Lili
2021-07-20  8:46               ` Jan Beulich
2021-07-20 11:13                 ` Cui, Lili
2021-07-20 11:26                 ` Cui, Lili
2021-07-20 13:02                   ` Jan Beulich [this message]
2021-07-20 13:38                     ` Cui, Lili
2021-07-20 14:15                       ` Jan Beulich
2021-07-20 14:29                         ` Cui, Lili
2021-07-21 10:32             ` Jan Beulich
2021-07-01 15:42 ` [PATCH 0/2]Enable Intel AVX512_FP16 instructions and add tests for it H.J. Lu
2021-07-01 17:46   ` H.J. Lu
2021-07-02  0:13     ` Cui, Lili
     [not found] ` <20210701074736.9534-3-lili.cui@intel.com>
2021-07-02 15:44   ` [PATCH 2/2] [PATCH 2/2] Add tests for Intel AVX512_FP16 instructions Jan Beulich
     [not found]     ` <BY5PR11MB4008FDC77679D0F8FB9E88B39E149@BY5PR11MB4008.namprd11.prod.outlook.com>
2021-07-13 15:59       ` Jan Beulich
2021-07-14 18:01         ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0db58f35-ebe7-5d80-d239-b4371ddf18fb@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hjl.tools@gmail.com \
    --cc=lili.cui@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).