RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
Date: Tue, 12 Dec 2023 13:15:52 +0000	[thread overview]
Message-ID: <SJ0PR11MB56009F89A7A8DAEAB96B4CA49E8EA@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <c5bb54a6-1e7a-4985-9075-cfe3341de32a@suse.com>

> On 12.12.2023 13:32, Cui, Lili wrote:
> >>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template
> >>>>>>> *t)
> >>>>>>>
> >>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
> >>>> variants.  */
> >>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>>> -  {
> >>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
> CpuAVX512VL)))
> >>>>>>> +    {
> >>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
> AVX512F(CpuFMA)
> >>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>>>> APX_F(CpuCMPCCXADD)
> >>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>>>> APX_F(CpuAVX512DQ)
> >>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >> APX_F(CpuBMI2))
> >>>>>>>  	{
> >>>>>>>  	  if (need_evex_encoding ())
> >>>>>>
> >>>>>> There are several issues here:
> >>>>>> - Why did you need to change (to the worse) the original code?
> >>>>>> - Why did you not model the addition after that original code?
> >>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >>>> AVX512
> >>>>>> insn can be VEX-encoded?
> >>>>>
> >>>>>  I don't understand what you mean, we have this combination.
> >>>>>
> >>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>>>
> >>>> Oh, I'm sorry: I forgot about the mask register insns.
> >>>>
> >>>>>> - If these new macros are really needed for whatever reason, they
> >>>> shouldn't
> >>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
> >>>> assembler.
> >>>>>> - Style requires a blank before the opening parenthesis in function
> >>>>>>   invocations (which also covers function-like macro invocations).
> >>>>>>
> >>>>>> I think I asked before: How is it that you get away without
> >>>>>> altering cpu_flags_match(), containing related and quite similar
> logic?
> >>>>>>
> >>>>>
> >>>>> For the original logic ( ... || ... ) && ( ... || ...), the
> >>>>> content in the first bracket
> >>>> and the content in the following brackets can be combined
> >>>> arbitrarily. I think it is Inaccurate.
> >>>>
> >>>> In which way? If there are issues with the existing code, these
> >>>> issues want taking care of in separate (prereq) patches. Of course
> >>>> there are assumptions made here about the CPU combinations that can
> >>>> (and cannot) occur in any of our templates. Similar assumptions are
> >>>> imo
> >> fine to make in the APX additions.
> >>>>
> >>>> Note how I used two nested if()s despite that not having been
> >>>> necessary at that time. I did so in anticipation that for APX you'd
> >>>> want to add another
> >>>> (separate) inner if(), rather than altering the one that's there.
> >>>
> >>> Could we remove the CPU check here? it's a bit ugly and has limited
> >> effectiveness.
> >>>
> >>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>     {
> >>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>
> >> I agree on the "a bit ugly" part, but taking what's there right now I
> >> don't understand "has limited effectiveness". Of course you can
> >> remove any code you want, provided you can prove nothing breaks.
> >>
> >
> > Here is install_template().
> > All I can say is that after removing the CPU check, no test cases failed. I
> know it's hard to convince you to delete this place, or what do you suggest to
> do with this? APX requires this, otherwise the test cases will fail.
> >
> > -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> > -       {
> 
> So be it then (assuming you don't delete any pre-existing code there). As said,
> I expect this will bite us later.
> 

Done.

+      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
+          || maybe_cpu (t, CpuFMA))
+         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
+         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
+         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
+         || APX_F(CpuBMI2))

> >>>>> Just found cpu_flags_match() has similar logic, I think the
> >>>>> following is the
> >>>> only code related to CPUID alerts, but none of our combinations are
> >>>> related to cpuavx.
> >>>>>
> >>>>>           if (all.bitfield.cpuavx)
> >>>>>             {
> >>>>>               /* We need to check SSE2AVX with AVX.  */
> >>>>>               if (!t->opcode_modifier.sse2avx
> >>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
> >>>>>             }
> >>>>
> >>>> Not sure why you pick out this one. This special case is needed for
> >>>> sse2avx; I don't see how it's related here. What I've been pointing
> >>>> you at is the code in that function which follows a similar "Dual
> >>>> VEX/EVEX
> >> templates ..."
> >>>> comment.
> >>>>
> >>>
> >>> I know you're talking about this code, I'm just guessing what it
> >>> does? Don't
> >> know what I missed.
> >>
> >> You pulled out this sse2avx code. Hence I was expecting you to tell
> >> me why you consider it relevant here.
> >>
> > Here is cpu_flag_match().
> >
> > I rechecked the code, maybe you want to say I missed the outer loop.
> >
> >       cpu = cpu_flags_and (any, active);
> >       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
> >         {
> >           if (all.bitfield.cpuavx)
> >             {
> >               /* We need to check SSE2AVX with AVX.  */
> >               if (!t->opcode_modifier.sse2avx
> >                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >                 match |= CPU_FLAGS_ARCH_MATCH;
> >             }
> >           else
> >             match |= CPU_FLAGS_ARCH_MATCH;
> >         }
> 
> No, ...
> 
> >>> For example
> >>>
> >>> .arch .nobmi
> >>> andn    (%eax), %eax, %eax
> >>>
> >>> ---------------------------------------------------------------------------------------------
> >>>   if (flag_code != CODE_64BIT)
> >>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
> >>>   else
> >>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
> >>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
> >>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
> >>>     {
> >>>     ...
> >>>     }
> >>> Return  CPU_FLAGS_64BIT_MATCH
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------------------------
> >>> Then we will report an arch error.
> >>>
> >>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
> >>>             {
> >>>               as_bad (_("`%s' is not supported on `%s%s'"),
> >>>                       insn_name (current_templates.start),
> >>>                       cpu_arch_name ? cpu_arch_name : default_arch,
> >>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
> >>>               return NULL;
> >>>             }
> >>
> >> Which is what we want, I think (for the particular example you
> >> picked)? Yet again, I don't think I can see what you're trying to
> >> tell me. I also have to confess I've lost track of whether we're
> >> discussing install_template(), cpu_flag_match(), or both. For example
> >> in install_template() you may indeed be able to get away with little
> >> or no changes, as long as there's no used features tracking for APX (see the
> early ELF-specific part of output_insn()).
> >> Things would be somewhat inconsistent then, but that may be tolerable
> >> (as long as properly justified in the patch description). Not getting
> >> this into proper shape right with the introduction of APX may bite us later,
> though.
> >>
> >
> > Here is cpu_flag_match().
> > I just want to say that for the APX part we don't need to handle it in the
> "Double VEX/EVEX Template...".
> 
> ... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
> understand how you get away without touching that, but if everything
> works, all is fine of course.
> 

Maybe, when FMA is combined with AVX512F, if we disable FMA, but the current instruction belongs to AVX512F, there is no need to report cpu errors for it. But it's different with APX. The combination of APX and BMI requires that both are indispensable.

Thanks,
Lili.

next prev parent reply	other threads:[~2023-12-12 13:15 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili [this message]
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB56009F89A7A8DAEAB96B4CA49E8EA@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).