public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x86: add missing APX logic to cpu_flags_match()
@ 2024-01-05 12:15 Jan Beulich
  2024-01-08  3:17 ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-05 12:15 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu, Lili Cui

As already indicated during review, we can't get away without certain
adjustments here: Without these, respective {evex}-prefixed insns are
assembled to APX encodings even when APX_F is turned off.

While there also extend the respective comment in the opcode table, to
explain why this construct is used.
---
Strictly speaking we could go with just cpuid|APX_F in the templates,
with the assertions dropped and instead an "else" added. But I think
we're better off this way, for being less prone to introducing mistakes
later on.

The resulting diagnostics aren't quite correct (because of not
mentioning the {evex} prefix, which really is what's the problem there),
but improving diagnostics is a wider topic anyway.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
 	      any.bitfield.cpuavx512vl = 0;
 	    }
 	}
+
+      /* Dual non-APX/APX templates need massaging from what APX_F() in the
+         opcode table has produced.  While the direct transformation of the
+         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) / cpuid&(APX_F)
+         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
+         instead.  */
+      if (any.bitfield.cpuapx_f
+	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
+	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
+	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
+	      || any.bitfield.cpucmpccxadd))
+	{
+	  /* These checks (verifying that APX_F() was properly used in the
+	     opcode table entry) make sure there's no need for an "else" to
+	     the "if()" below.  */
+	  gas_assert (!cpu_flags_all_zero (&all));
+	  cpu = cpu_flags_and (all, any);
+	  gas_assert (cpu_flags_equal (&cpu, &all));
+
+	  if (need_evex_encoding (t))
+	    all = any;
+
+	  memset (&any, 0, sizeof (any));
+	}
     }
 
   if (flag_code != CODE_64BIT)
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -143,7 +143,13 @@
 
 #define DstVVVV VexVVVV=VexVVVV_DST
 
-// The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
+// The template supports VEX format for cpuid and EVEX format for cpuid & APX_F.
+// While therefore we really mean cpuid|(cpuid&APX_F) here, this can't be
+// expressed in the generated templates.  It's equivalent to just cpuid|APX_F
+// anyway, but that is not what we want (as APX_F alone isn't a sufficient
+// prereq for such insns). Instead the assembler will massage the CPU specifier
+// to the equivalent of either cpuid&(cpuid) or cpuid&(APX_F) (or something
+// substantially similar), depending on what encoding was requested.
 #define APX_F(cpuid) cpuid&(cpuid|APX_F)
 
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
@@ -13,6 +13,13 @@
 .*:25: Error: `andn' is not supported on `x86_64.nobmi'
 .*:28: Error: `bzhi' is not supported on `x86_64.nobmi2'
 .*:29: Error: `bzhi' is not supported on `x86_64.nobmi2'
+.*:33: Error: .*`andn'.*
+.*:34: Error: .*`bzhi'.*
+.*:35: Error: .*`kmovw'.*
+.*:36: Error: .*`kmovq'.*
+.*:37: Error: .*`kmovb'.*
+.*:38: Error: .*`ldtilecfg'.*
+.*:39: Error: .*`cmpexadd'.*
 GAS LISTING .*
 #...
 [ 	]*1[ 	]+\# Check illegal 64bit APX EVEX promoted instructions
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
@@ -27,3 +27,13 @@
 	.arch .nobmi2
 	bzhi %r16,%r15,%r11
 	bzhi %r15,%r15,%r11
+
+	.arch default
+	.arch .noapx_f
+	{evex} andn %r15, %r15, %r11
+	{evex} bzhi %r15, %r15, %r11
+	{evex} kmovw %k1, %r8d
+	{evex} kmovq %k1, %r8
+	{evex} kmovb %k1, %r8d
+	{evex} ldtilecfg (%r8)
+	{evex} cmpexadd %rax, %rcx, (%r8)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-05 12:15 [PATCH] x86: add missing APX logic to cpu_flags_match() Jan Beulich
@ 2024-01-08  3:17 ` Cui, Lili
  2024-01-08  7:56   ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Cui, Lili @ 2024-01-08  3:17 UTC (permalink / raw)
  To: Beulich, Jan, Binutils; +Cc: H.J. Lu

> As already indicated during review, we can't get away without certain
> adjustments here: Without these, respective {evex}-prefixed insns are
> assembled to APX encodings even when APX_F is turned off.
> 
> While there also extend the respective comment in the opcode table, to
> explain why this construct is used.
> ---
> Strictly speaking we could go with just cpuid|APX_F in the templates, with the
> assertions dropped and instead an "else" added. But I think we're better off
> this way, for being less prone to introducing mistakes later on.
> 
> The resulting diagnostics aren't quite correct (because of not mentioning the
> {evex} prefix, which really is what's the problem there), but improving
> diagnostics is a wider topic anyway.
> 

Oh, get it, thanks.
Currently, a lot of ugly special handling has been added to i386 in order to merge these vex/evex insns, which increases the complexity of the code. Personally, I don't think it's worth it. Other issues may arise later.

> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
>  	      any.bitfield.cpuavx512vl = 0;
>  	    }
>  	}
> +
> +      /* Dual non-APX/APX templates need massaging from what APX_F() in the
> +         opcode table has produced.  While the direct transformation of the
> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
> cpuid&(APX_F)
> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
> +         instead.  */
> +      if (any.bitfield.cpuapx_f
> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
> +	      || any.bitfield.cpucmpccxadd))
> +	{
> +	  /* These checks (verifying that APX_F() was properly used in the
> +	     opcode table entry) make sure there's no need for an "else" to
> +	     the "if()" below.  */
> +	  gas_assert (!cpu_flags_all_zero (&all));
> +	  cpu = cpu_flags_and (all, any);
> +	  gas_assert (cpu_flags_equal (&cpu, &all));
> +
> +	  if (need_evex_encoding (t))
> +	    all = any;
> +

> +	  memset (&any, 0, sizeof (any));

Wouldn't it make sense to put it in the else branch and clean out APX-F specifically? Just like you did before.

  if (need_evex_encoding (t))
    all = any;
else
   any.bitfield.cpuapx_f = 0;

thanks,
Lili.

> +	}
>      }
> 
>    if (flag_code != CODE_64BIT)
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -143,7 +143,13 @@
> 
>  #define DstVVVV VexVVVV=VexVVVV_DST
> 
> -// The template supports VEX format for cpuid and EVEX format for cpuid &
> apx_f.
> +// The template supports VEX format for cpuid and EVEX format for cpuid &
> APX_F.
> +// While therefore we really mean cpuid|(cpuid&APX_F) here, this can't
> +be // expressed in the generated templates.  It's equivalent to just
> +cpuid|APX_F // anyway, but that is not what we want (as APX_F alone
> +isn't a sufficient // prereq for such insns). Instead the assembler
> +will massage the CPU specifier // to the equivalent of either
> +cpuid&(cpuid) or cpuid&(APX_F) (or something // substantially similar),
> depending on what encoding was requested.
>  #define APX_F(cpuid) cpuid&(cpuid|APX_F)
> 
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-
> use
> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
> @@ -13,6 +13,13 @@
>  .*:25: Error: `andn' is not supported on `x86_64.nobmi'
>  .*:28: Error: `bzhi' is not supported on `x86_64.nobmi2'
>  .*:29: Error: `bzhi' is not supported on `x86_64.nobmi2'
> +.*:33: Error: .*`andn'.*
> +.*:34: Error: .*`bzhi'.*
> +.*:35: Error: .*`kmovw'.*
> +.*:36: Error: .*`kmovq'.*
> +.*:37: Error: .*`kmovb'.*
> +.*:38: Error: .*`ldtilecfg'.*
> +.*:39: Error: .*`cmpexadd'.*
>  GAS LISTING .*
>  #...
>  [ 	]*1[ 	]+\# Check illegal 64bit APX EVEX promoted instructions
> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
> @@ -27,3 +27,13 @@
>  	.arch .nobmi2
>  	bzhi %r16,%r15,%r11
>  	bzhi %r15,%r15,%r11
> +
> +	.arch default
> +	.arch .noapx_f
> +	{evex} andn %r15, %r15, %r11
> +	{evex} bzhi %r15, %r15, %r11
> +	{evex} kmovw %k1, %r8d
> +	{evex} kmovq %k1, %r8
> +	{evex} kmovb %k1, %r8d
> +	{evex} ldtilecfg (%r8)
> +	{evex} cmpexadd %rax, %rcx, (%r8)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08  3:17 ` Cui, Lili
@ 2024-01-08  7:56   ` Jan Beulich
  2024-01-08  8:30     ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-08  7:56 UTC (permalink / raw)
  To: Cui, Lili; +Cc: H.J. Lu, Binutils

On 08.01.2024 04:17, Cui, Lili wrote:
>> As already indicated during review, we can't get away without certain
>> adjustments here: Without these, respective {evex}-prefixed insns are
>> assembled to APX encodings even when APX_F is turned off.
>>
>> While there also extend the respective comment in the opcode table, to
>> explain why this construct is used.
>> ---
>> Strictly speaking we could go with just cpuid|APX_F in the templates, with the
>> assertions dropped and instead an "else" added. But I think we're better off
>> this way, for being less prone to introducing mistakes later on.
>>
>> The resulting diagnostics aren't quite correct (because of not mentioning the
>> {evex} prefix, which really is what's the problem there), but improving
>> diagnostics is a wider topic anyway.
> 
> Oh, get it, thanks.
> Currently, a lot of ugly special handling has been added to i386 in order to merge these vex/evex insns, which increases the complexity of the code. Personally, I don't think it's worth it. Other issues may arise later.

Well, if it was just for the saving of a few templates, it probably indeed
wouldn't be worth it. But the set of templates saved is quite significant
overall, hence why I did it in the first place. The number of templates to
consult does, after all, matter when processing a respective insn.

>> --- a/gas/config/tc-i386.c
>> +++ b/gas/config/tc-i386.c
>> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
>>  	      any.bitfield.cpuavx512vl = 0;
>>  	    }
>>  	}
>> +
>> +      /* Dual non-APX/APX templates need massaging from what APX_F() in the
>> +         opcode table has produced.  While the direct transformation of the
>> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
>> cpuid&(APX_F)
>> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
>> +         instead.  */
>> +      if (any.bitfield.cpuapx_f
>> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
>> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
>> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
>> +	      || any.bitfield.cpucmpccxadd))
>> +	{
>> +	  /* These checks (verifying that APX_F() was properly used in the
>> +	     opcode table entry) make sure there's no need for an "else" to
>> +	     the "if()" below.  */
>> +	  gas_assert (!cpu_flags_all_zero (&all));
>> +	  cpu = cpu_flags_and (all, any);
>> +	  gas_assert (cpu_flags_equal (&cpu, &all));
>> +
>> +	  if (need_evex_encoding (t))
>> +	    all = any;
>> +
> 
>> +	  memset (&any, 0, sizeof (any));
> 
> Wouldn't it make sense to put it in the else branch and clean out APX-F specifically? Just like you did before.
> 
>   if (need_evex_encoding (t))
>     all = any;
> else
>    any.bitfield.cpuapx_f = 0;

That was an alternative I did consider, yes, but the way I've done it is
overall more self-consistent imo, at the expense of being less consistent
with the AVX/AVX512 logic (the moving of "any" to "all" isn't consistent
with that anyway).

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08  7:56   ` Jan Beulich
@ 2024-01-08  8:30     ` Cui, Lili
  2024-01-08  8:58       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Cui, Lili @ 2024-01-08  8:30 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: H.J. Lu, Binutils

> >> --- a/gas/config/tc-i386.c
> >> +++ b/gas/config/tc-i386.c
> >> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
> >>  	      any.bitfield.cpuavx512vl = 0;
> >>  	    }
> >>  	}
> >> +
> >> +      /* Dual non-APX/APX templates need massaging from what APX_F() in
> the
> >> +         opcode table has produced.  While the direct transformation of the
> >> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
> >> cpuid&(APX_F)
> >> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
> >> +         instead.  */
> >> +      if (any.bitfield.cpuapx_f
> >> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
> >> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
> >> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
> >> +	      || any.bitfield.cpucmpccxadd))
> >> +	{
> >> +	  /* These checks (verifying that APX_F() was properly used in the
> >> +	     opcode table entry) make sure there's no need for an "else" to
> >> +	     the "if()" below.  */
> >> +	  gas_assert (!cpu_flags_all_zero (&all));
> >> +	  cpu = cpu_flags_and (all, any);
> >> +	  gas_assert (cpu_flags_equal (&cpu, &all));
> >> +
> >> +	  if (need_evex_encoding (t))
> >> +	    all = any;
> >> +
> >
> >> +	  memset (&any, 0, sizeof (any));
> >
> > Wouldn't it make sense to put it in the else branch and clean out APX-F
> specifically? Just like you did before.
> >
> >   if (need_evex_encoding (t))
> >     all = any;
> > else
> >    any.bitfield.cpuapx_f = 0;
> 
> That was an alternative I did consider, yes, but the way I've done it is overall
> more self-consistent imo, at the expense of being less consistent with the
> AVX/AVX512 logic (the moving of "any" to "all" isn't consistent with that
> anyway).
> 

  memset (&any, 0, sizeof (any)); 

I'd say this would make "any" not match the actual value, which might be used later, but it's been cleared here.

Lili.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08  8:30     ` Cui, Lili
@ 2024-01-08  8:58       ` Jan Beulich
  2024-01-08 10:28         ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-08  8:58 UTC (permalink / raw)
  To: Cui, Lili; +Cc: H.J. Lu, Binutils

On 08.01.2024 09:30, Cui, Lili wrote:
>>>> --- a/gas/config/tc-i386.c
>>>> +++ b/gas/config/tc-i386.c
>>>> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
>>>>  	      any.bitfield.cpuavx512vl = 0;
>>>>  	    }
>>>>  	}
>>>> +
>>>> +      /* Dual non-APX/APX templates need massaging from what APX_F() in
>> the
>>>> +         opcode table has produced.  While the direct transformation of the
>>>> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
>>>> cpuid&(APX_F)
>>>> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
>>>> +         instead.  */
>>>> +      if (any.bitfield.cpuapx_f
>>>> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
>>>> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
>>>> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
>>>> +	      || any.bitfield.cpucmpccxadd))
>>>> +	{
>>>> +	  /* These checks (verifying that APX_F() was properly used in the
>>>> +	     opcode table entry) make sure there's no need for an "else" to
>>>> +	     the "if()" below.  */
>>>> +	  gas_assert (!cpu_flags_all_zero (&all));
>>>> +	  cpu = cpu_flags_and (all, any);
>>>> +	  gas_assert (cpu_flags_equal (&cpu, &all));
>>>> +
>>>> +	  if (need_evex_encoding (t))
>>>> +	    all = any;
>>>> +
>>>
>>>> +	  memset (&any, 0, sizeof (any));
>>>
>>> Wouldn't it make sense to put it in the else branch and clean out APX-F
>> specifically? Just like you did before.
>>>
>>>   if (need_evex_encoding (t))
>>>     all = any;
>>> else
>>>    any.bitfield.cpuapx_f = 0;
>>
>> That was an alternative I did consider, yes, but the way I've done it is overall
>> more self-consistent imo, at the expense of being less consistent with the
>> AVX/AVX512 logic (the moving of "any" to "all" isn't consistent with that
>> anyway).
>>
> 
>   memset (&any, 0, sizeof (any)); 
> 
> I'd say this would make "any" not match the actual value, which might be used later, but it's been cleared here.

I'm afraid I don't get what you're trying to tell me.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08  8:58       ` Jan Beulich
@ 2024-01-08 10:28         ` Cui, Lili
  2024-01-08 10:38           ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Cui, Lili @ 2024-01-08 10:28 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: H.J. Lu, Binutils

> On 08.01.2024 09:30, Cui, Lili wrote:
> >>>> --- a/gas/config/tc-i386.c
> >>>> +++ b/gas/config/tc-i386.c
> >>>> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
> >>>>  	      any.bitfield.cpuavx512vl = 0;
> >>>>  	    }
> >>>>  	}
> >>>> +
> >>>> +      /* Dual non-APX/APX templates need massaging from what
> >>>> + APX_F() in
> >> the
> >>>> +         opcode table has produced.  While the direct transformation of the
> >>>> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
> >>>> cpuid&(APX_F)
> >>>> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
> >>>> +         instead.  */
> >>>> +      if (any.bitfield.cpuapx_f
> >>>> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
> >>>> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
> >>>> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
> >>>> +	      || any.bitfield.cpucmpccxadd))
> >>>> +	{
> >>>> +	  /* These checks (verifying that APX_F() was properly used in the
> >>>> +	     opcode table entry) make sure there's no need for an "else" to
> >>>> +	     the "if()" below.  */
> >>>> +	  gas_assert (!cpu_flags_all_zero (&all));
> >>>> +	  cpu = cpu_flags_and (all, any);
> >>>> +	  gas_assert (cpu_flags_equal (&cpu, &all));
> >>>> +
> >>>> +	  if (need_evex_encoding (t))
> >>>> +	    all = any;
> >>>> +
> >>>
> >>>> +	  memset (&any, 0, sizeof (any));
> >>>
> >>> Wouldn't it make sense to put it in the else branch and clean out
> >>> APX-F
> >> specifically? Just like you did before.
> >>>
> >>>   if (need_evex_encoding (t))
> >>>     all = any;
> >>> else
> >>>    any.bitfield.cpuapx_f = 0;
> >>
> >> That was an alternative I did consider, yes, but the way I've done it
> >> is overall more self-consistent imo, at the expense of being less
> >> consistent with the
> >> AVX/AVX512 logic (the moving of "any" to "all" isn't consistent with
> >> that anyway).
> >>
> >
> >   memset (&any, 0, sizeof (any));
> >
> > I'd say this would make "any" not match the actual value, which might be
> used later, but it's been cleared here.
> 
> I'm afraid I don't get what you're trying to tell me.
> 

What I mean is that memset will clear the variable "any", there is no problem in handling it this way. But I think the following way is more reasonable.

For evex it should be:
any.bitfield.cpubmi = 1
any.bitfield.cpuapx_f = 1

For vex it should be:
any.bitfield.cpubmi = 1
any.bitfield.cpuapx_f = 0

Instead of clearing all values in "any".

Lili.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08 10:28         ` Cui, Lili
@ 2024-01-08 10:38           ` Jan Beulich
  2024-01-09  5:36             ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-08 10:38 UTC (permalink / raw)
  To: Cui, Lili; +Cc: H.J. Lu, Binutils

On 08.01.2024 11:28, Cui, Lili wrote:
>> On 08.01.2024 09:30, Cui, Lili wrote:
>>>>>> --- a/gas/config/tc-i386.c
>>>>>> +++ b/gas/config/tc-i386.c
>>>>>> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
>>>>>>  	      any.bitfield.cpuavx512vl = 0;
>>>>>>  	    }
>>>>>>  	}
>>>>>> +
>>>>>> +      /* Dual non-APX/APX templates need massaging from what
>>>>>> + APX_F() in
>>>> the
>>>>>> +         opcode table has produced.  While the direct transformation of the
>>>>>> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid) /
>>>>>> cpuid&(APX_F)
>>>>>> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
>>>>>> +         instead.  */
>>>>>> +      if (any.bitfield.cpuapx_f
>>>>>> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
>>>>>> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
>>>>>> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
>>>>>> +	      || any.bitfield.cpucmpccxadd))
>>>>>> +	{
>>>>>> +	  /* These checks (verifying that APX_F() was properly used in the
>>>>>> +	     opcode table entry) make sure there's no need for an "else" to
>>>>>> +	     the "if()" below.  */
>>>>>> +	  gas_assert (!cpu_flags_all_zero (&all));
>>>>>> +	  cpu = cpu_flags_and (all, any);
>>>>>> +	  gas_assert (cpu_flags_equal (&cpu, &all));
>>>>>> +
>>>>>> +	  if (need_evex_encoding (t))
>>>>>> +	    all = any;
>>>>>> +
>>>>>
>>>>>> +	  memset (&any, 0, sizeof (any));
>>>>>
>>>>> Wouldn't it make sense to put it in the else branch and clean out
>>>>> APX-F
>>>> specifically? Just like you did before.
>>>>>
>>>>>   if (need_evex_encoding (t))
>>>>>     all = any;
>>>>> else
>>>>>    any.bitfield.cpuapx_f = 0;
>>>>
>>>> That was an alternative I did consider, yes, but the way I've done it
>>>> is overall more self-consistent imo, at the expense of being less
>>>> consistent with the
>>>> AVX/AVX512 logic (the moving of "any" to "all" isn't consistent with
>>>> that anyway).
>>>>
>>>
>>>   memset (&any, 0, sizeof (any));
>>>
>>> I'd say this would make "any" not match the actual value, which might be
>> used later, but it's been cleared here.
>>
>> I'm afraid I don't get what you're trying to tell me.
>>
> 
> What I mean is that memset will clear the variable "any", there is no problem in handling it this way. But I think the following way is more reasonable.
> 
> For evex it should be:
> any.bitfield.cpubmi = 1
> any.bitfield.cpuapx_f = 1
> 
> For vex it should be:
> any.bitfield.cpubmi = 1
> any.bitfield.cpuapx_f = 0
> 
> Instead of clearing all values in "any".

But why would you want to have the same in "any" that you already have in
"all"? That would incur extra checks later in the function for no gain.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-08 10:38           ` Jan Beulich
@ 2024-01-09  5:36             ` Cui, Lili
  2024-01-09  8:30               ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Cui, Lili @ 2024-01-09  5:36 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: H.J. Lu, Binutils

> On 08.01.2024 11:28, Cui, Lili wrote:
> >> On 08.01.2024 09:30, Cui, Lili wrote:
> >>>>>> --- a/gas/config/tc-i386.c
> >>>>>> +++ b/gas/config/tc-i386.c
> >>>>>> @@ -1940,6 +1940,30 @@ cpu_flags_match (const insn_template *t)
> >>>>>>  	      any.bitfield.cpuavx512vl = 0;
> >>>>>>  	    }
> >>>>>>  	}
> >>>>>> +
> >>>>>> +      /* Dual non-APX/APX templates need massaging from what
> >>>>>> + APX_F() in
> >>>> the
> >>>>>> +         opcode table has produced.  While the direct transformation of
> the
> >>>>>> +         incoming cpuid&(cpuid|APX_F) would be to cpuid&(cpuid)
> >>>>>> + /
> >>>>>> cpuid&(APX_F)
> >>>>>> +         respectively, it's cheaper to move to just cpuid / cpuid&APX_F
> >>>>>> +         instead.  */
> >>>>>> +      if (any.bitfield.cpuapx_f
> >>>>>> +	  && (any.bitfield.cpubmi || any.bitfield.cpubmi2
> >>>>>> +	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
> >>>>>> +	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
> >>>>>> +	      || any.bitfield.cpucmpccxadd))
> >>>>>> +	{
> >>>>>> +	  /* These checks (verifying that APX_F() was properly used in
> the
> >>>>>> +	     opcode table entry) make sure there's no need for an "else"
> to
> >>>>>> +	     the "if()" below.  */
> >>>>>> +	  gas_assert (!cpu_flags_all_zero (&all));
> >>>>>> +	  cpu = cpu_flags_and (all, any);
> >>>>>> +	  gas_assert (cpu_flags_equal (&cpu, &all));
> >>>>>> +
> >>>>>> +	  if (need_evex_encoding (t))
> >>>>>> +	    all = any;
> >>>>>> +
> >>>>>
> >>>>>> +	  memset (&any, 0, sizeof (any));
> >>>>>
> >>>>> Wouldn't it make sense to put it in the else branch and clean out
> >>>>> APX-F
> >>>> specifically? Just like you did before.
> >>>>>
> >>>>>   if (need_evex_encoding (t))
> >>>>>     all = any;
> >>>>> else
> >>>>>    any.bitfield.cpuapx_f = 0;
> >>>>
> >>>> That was an alternative I did consider, yes, but the way I've done
> >>>> it is overall more self-consistent imo, at the expense of being
> >>>> less consistent with the
> >>>> AVX/AVX512 logic (the moving of "any" to "all" isn't consistent
> >>>> with that anyway).
> >>>>
> >>>
> >>>   memset (&any, 0, sizeof (any));
> >>>
> >>> I'd say this would make "any" not match the actual value, which
> >>> might be
> >> used later, but it's been cleared here.
> >>
> >> I'm afraid I don't get what you're trying to tell me.
> >>
> >
> > What I mean is that memset will clear the variable "any", there is no problem
> in handling it this way. But I think the following way is more reasonable.
> >
> > For evex it should be:
> > any.bitfield.cpubmi = 1
> > any.bitfield.cpuapx_f = 1
> >
> > For vex it should be:
> > any.bitfield.cpubmi = 1
> > any.bitfield.cpuapx_f = 0
> >
> > Instead of clearing all values in "any".
> 
> But why would you want to have the same in "any" that you already have in
> "all"? That would incur extra checks later in the function for no gain.
> 

Same value but completely different meaning, if you think the current one is better, that's fine with me. By the way, I have a question about " t->cpu_any", is it specific to dual VEX/EVEX templates?

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-09  5:36             ` Cui, Lili
@ 2024-01-09  8:30               ` Jan Beulich
  2024-01-09 11:00                 ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-09  8:30 UTC (permalink / raw)
  To: Cui, Lili; +Cc: H.J. Lu, Binutils

On 09.01.2024 06:36, Cui, Lili wrote:
> By the way, I have a question about " t->cpu_any", is it specific to dual VEX/EVEX templates?

No. And I think it's used elsewhere already - see e.g. SFENCE.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-09  8:30               ` Jan Beulich
@ 2024-01-09 11:00                 ` Cui, Lili
  2024-01-09 11:07                   ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Cui, Lili @ 2024-01-09 11:00 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: H.J. Lu, Binutils

> Subject: Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
> 
> On 09.01.2024 06:36, Cui, Lili wrote:
> > By the way, I have a question about " t->cpu_any", is it specific to dual
> VEX/EVEX templates?
> 
> No. And I think it's used elsewhere already - see e.g. SFENCE.
> 
Oh, I get it. "any" means that any cpuid supports this instruction. It should be cleared for APX (Daul VEX/EVEX). IOW, "any" doesn't work with "cpuid&(cpuid|APX_F)".

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-09 11:00                 ` Cui, Lili
@ 2024-01-09 11:07                   ` Jan Beulich
  2024-01-10  1:44                     ` Cui, Lili
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-01-09 11:07 UTC (permalink / raw)
  To: Cui, Lili; +Cc: H.J. Lu, Binutils

On 09.01.2024 12:00, Cui, Lili wrote:
>> Subject: Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
>>
>> On 09.01.2024 06:36, Cui, Lili wrote:
>>> By the way, I have a question about " t->cpu_any", is it specific to dual
>> VEX/EVEX templates?
>>
>> No. And I think it's used elsewhere already - see e.g. SFENCE.
>>
> Oh, I get it. "any" means that any cpuid supports this instruction. It should be cleared for APX (Daul VEX/EVEX). IOW, "any" doesn't work with "cpuid&(cpuid|APX_F)".

Well, in a way it does, as outlined in a comment: <cpuid>&(APX_F)
(all.<cpuid>=true, any.apx_f=true, all other bits of any clear)
simply is the same as <cpuid>&APX_F (all.<cpuid>=true +
all.apx_f=true). The crucial part here is that "any" may not have
any other bits set then (after the purging of the one <cpuid> one).
And it was only while making the change that I noticed that doing
things the way the patch is now doing them is overall more neat
(with the same functional effect). As mentioned, we may still want
to consider to simplify (or even purge) APX_F() as a result.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] x86: add missing APX logic to cpu_flags_match()
  2024-01-09 11:07                   ` Jan Beulich
@ 2024-01-10  1:44                     ` Cui, Lili
  0 siblings, 0 replies; 12+ messages in thread
From: Cui, Lili @ 2024-01-10  1:44 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: H.J. Lu, Binutils

> 
> On 09.01.2024 12:00, Cui, Lili wrote:
> >> Subject: Re: [PATCH] x86: add missing APX logic to cpu_flags_match()
> >>
> >> On 09.01.2024 06:36, Cui, Lili wrote:
> >>> By the way, I have a question about " t->cpu_any", is it specific to
> >>> dual
> >> VEX/EVEX templates?
> >>
> >> No. And I think it's used elsewhere already - see e.g. SFENCE.
> >>
> > Oh, I get it. "any" means that any cpuid supports this instruction. It should
> be cleared for APX (Daul VEX/EVEX). IOW, "any" doesn't work with
> "cpuid&(cpuid|APX_F)".
> 
> Well, in a way it does, as outlined in a comment: <cpuid>&(APX_F)
> (all.<cpuid>=true, any.apx_f=true, all other bits of any clear) simply is the same
> as <cpuid>&APX_F (all.<cpuid>=true + all.apx_f=true). The crucial part here is
> that "any" may not have any other bits set then (after the purging of the one
> <cpuid> one).
> And it was only while making the change that I noticed that doing things the
> way the patch is now doing them is overall more neat (with the same
> functional effect). As mentioned, we may still want to consider to simplify (or
> even purge) APX_F() as a result.
> 

Yes, agree.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-01-10  1:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-05 12:15 [PATCH] x86: add missing APX logic to cpu_flags_match() Jan Beulich
2024-01-08  3:17 ` Cui, Lili
2024-01-08  7:56   ` Jan Beulich
2024-01-08  8:30     ` Cui, Lili
2024-01-08  8:58       ` Jan Beulich
2024-01-08 10:28         ` Cui, Lili
2024-01-08 10:38           ` Jan Beulich
2024-01-09  5:36             ` Cui, Lili
2024-01-09  8:30               ` Jan Beulich
2024-01-09 11:00                 ` Cui, Lili
2024-01-09 11:07                   ` Jan Beulich
2024-01-10  1:44                     ` Cui, Lili

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).