From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=+aEj=HP=suse.com=jbeulich@sourceware.org>
Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333])
	by sourceware.org (Postfix) with ESMTPS id E62EC3858D28
	for <binutils@sourceware.org>; Mon,  4 Dec 2023 16:30:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E62EC3858D28
Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E62EC3858D28
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::333
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701707431; cv=none;
	b=fS0ZZNe8F2fbtjCJY44ahv7DYEQjsOUmyVmFSYwQyT8ilGnCGXZwfeywtWgOg+YwBQwNnfdknL46Ql5dwDoA+KmycVGdggHYVntAjlmzFT/uNMcMCuvxeJyjVkYgPG2Vcn6slVs9cp0R+xsoex/yfePBo7S+EabSvuh9JwTZv9c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1701707431; c=relaxed/simple;
	bh=wHbBOQ18pSyw8crmsJuioG9S72hlX9ynd3QDIRLtxFo=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=wcV4Ha2jruyUups5VZExPq9lc1GU6kaXjFuWzie7QZqXL0fin/INpNVdOJ8Ysuvx7rbv2uu/+IK3veygtZa3W2eJi92SElBkL/DYEqGlpstYzYar1wtZkfEJIQofipQ2sYWiQ+806SfnUE5MvMFcMpxhzlFgT714uErBhwizxFM=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-40c0a11a914so13346895e9.2
        for <binutils@sourceware.org>; Mon, 04 Dec 2023 08:30:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=suse.com; s=google; t=1701707426; x=1702312226; darn=sourceware.org;
        h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc
         :to:content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=/muX3/IPaVGfpdt3tNAVpZj0EQmBp1UojawqSwe/T2M=;
        b=RDknjL0c7nMHR29XV21Y4ySWeDXQLnN8LqJhznIhUYLo3hvfnjr2/K3HL+yBJO3blL
         kVYIPbJ+tnlATk3ux/r0wFz9vW2fIgG88NzwehF45eb65T8MaAGF3POQuHxetSpbxUw8
         HaOl6V00V0fM2pwyWQp3Co5OribD+8aY5gLXe9XgQXD2Pt7Yaz4XsvbSprpMRbdbRC5P
         z2yoxxzQg/hgfunLzAPVVjQNpFUXCAIJ/pnUIqjxQ/S2uzGdjtuyznTsxKTa+ubvjUE3
         0SKIlOGhqBv2V8+pGSx8O2AKyIXXy4QVt4g/OtDJcnIk6EHSDnm69d9LrcMCHndvF1fs
         GR3A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1701707426; x=1702312226;
        h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc
         :to:content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=/muX3/IPaVGfpdt3tNAVpZj0EQmBp1UojawqSwe/T2M=;
        b=nnWOvpnfaOOeHn87Srdd1ZKxbgWa7Xyv6gXS7luB47+Wxgue2WzuL9CFFoffNHf6jh
         o0d7rs+4Ip4x1dnX/DKZwogalBULmmCZ5VKcZG/7LIUBDZv2f9qCMY2eEYeHTI1g/Xy5
         facCKXm68oEc1BGJDY2UtvxpIJF/gRo0Hu+oyAqbjFGyhslXOZuPKJBvD3oZm5/TBd27
         zQFtrmPvq8u5xJvwIYT0EohfatDV0pech9SbCbQ42bhCpmKE0RUb8WFhdTan1qNXjOAn
         gKpynCvhFQYK39W0X8NjgjxRQtuEe445MOmdKpt60ipeb8N6MfIjW4GhUISXNnU/paUl
         jdag==
X-Gm-Message-State: AOJu0Yx7lmOi2iY3dtQQLprHdCzohk4WKCgveSMlWRnLLsdSyRAw3jFC
	qloHdwr+hUgCyk8781n7rF6Q
X-Google-Smtp-Source: AGHT+IEyaE1qwiA9WPf4LSM6xnEITd89IDEyvRRt8tUCOrLO2ZORlz2gTv/Xk3b5uivy2+Paa6J8lA==
X-Received: by 2002:a05:600c:4ec8:b0:40b:5e1e:b3be with SMTP id g8-20020a05600c4ec800b0040b5e1eb3bemr2693502wmq.60.1701707426591;
        Mon, 04 Dec 2023 08:30:26 -0800 (PST)
Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209])
        by smtp.gmail.com with ESMTPSA id jg23-20020a05600ca01700b0040b30be6244sm15588830wmb.24.2023.12.04.08.30.26
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 04 Dec 2023 08:30:26 -0800 (PST)
Message-ID: <14e62b44-2c5b-4a3b-a66a-cffa8f33578c@suse.com>
Date: Mon, 4 Dec 2023 17:30:25 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
Content-Language: en-US
To: "Cui, Lili" <lili.cui@intel.com>
Cc: hongjiu.lu@intel.com, binutils@sourceware.org
References: <20231124070213.3886483-1-lili.cui@intel.com>
 <20231124070213.3886483-2-lili.cui@intel.com>
From: Jan Beulich <jbeulich@suse.com>
Autocrypt: addr=jbeulich@suse.com; keydata=
 xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk
 hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK
 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD
 /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py
 O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl
 MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP
 nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo
 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp
 Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC
 AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee
 e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF
 hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l
 IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS
 FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj
 t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8
 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3
 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9
 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V
 m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM
 EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr
 wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A
 nAuWpQkjM1ASeQwSHEeAWPgskBQL
In-Reply-To: <20231124070213.3886483-2-lili.cui@intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3026.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>

On 24.11.2023 08:02, Cui, Lili wrote:
> @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_rex2_encoding (void)
> +{
> +  return i.rex2 || i.rex2_encoding;
> +}

This function is used just once. Do we really need it? Or else why
don't you use it near the end of md_assemble()?

> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
>      i.vex.bytes[3] |= i.mask.reg->reg_num;
>  }
>  
> +/* Build (2 bytes) rex2 prefix.
> +   | D5h |
> +   | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same.  */

How do they handle it the same? (Also I don't think this is useful as
a code comment; it instead belongs in the description imo.)

> +  i.vex.length = 2;
> +  i.vex.bytes[0] = 0xd5;
> +  /* For the W R X B bits, the variables of rex prefix will be reused.  */
> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> +		    | (i.rex2 << 4) | i.rex);
> +}
> +
>  static void
>  process_immext (void)
>  {
> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
>  	  i.suffix = 0;
>  	  /* Convert to byte registers.  */
>  	  if (i.types[1].bitfield.word)
> -	    j = 16;
> -	  else if (i.types[1].bitfield.dword)
> +	    /* There are 40 8-bit registers.  */
>  	    j = 32;
> +	  else if (i.types[1].bitfield.dword)
> +	    /* 32 8-bit registers + 32 16-bit registers.  */
> +	    j = 64;
>  	  else
> -	    j = 48;
> -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> +	    /* 32 8-bit registers + 32 16-bit registers
> +	       + 32 32-bit registers.  */
> +	    j = 96;
> +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
>  	    j += 8;
>  	  i.op[1].regs -= j;
>  	}

I did comment on, in particular, the 8-bit register counts before.
Afaict the comments above are nevertheless unchanged and hence
still not really correct.

> @@ -5576,6 +5615,13 @@ md_assemble (char *line)
>  	  return;
>  	}
>  
> +      /* Check for explicit REX2 prefix.  */
> +      if (i.rex2_encoding)
> +	{
> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> +	  return;
> +	}

Again I'm pretty sure I pointed out before that i.rex2_encoding reflects
use of {rex2}. Which then the error message should correctly refer to.

> @@ -5615,11 +5661,12 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
>  
> -      i.rex |= REX_OPCODE;
> +      if (!i.rex2)
> +	i.rex |= REX_OPCODE;
>        for (x = 0; x < 2; x++)
>  	{
>  	  /* Look for 8 bit operand that uses old registers.  */
> @@ -5630,7 +5677,7 @@ md_assemble (char *line)
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
>  		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> +			  "instruction requiring REX/REX2 prefix."),
>  			register_prefix, i.op[x].regs->reg_name);
>  
>  	      /* Otherwise it is equivalent to the extended register.
> @@ -5642,11 +5689,11 @@ md_assemble (char *line)
>  	}
>      }
>  
> -  if (i.rex == 0 && i.rex_encoding)
> +  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))

Doesn't this want to be

  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))

?

>      {
>        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
>  	 that uses legacy register.  If it is "hi" register, don't add
> -	 the REX_OPCODE byte.  */
> +	 rex and rex2 prefix.  */
>        int x;
>        for (x = 0; x < 2; x++)
>  	if (i.types[x].bitfield.class == Reg
> @@ -5656,6 +5703,7 @@ md_assemble (char *line)
>  	  {
>  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	    i.rex_encoding = false;
> +	    i.rex2_encoding = false;
>  	    break;
>  	  }
>  
> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
>  	i.rex = REX_OPCODE;
>      }
>  
> -  if (i.rex != 0)
> +  if (i.rex2 != 0 || i.rex2_encoding)
> +    {
> +      build_rex2_prefix ();
> +      /* The individual REX.RXBW bits got consumed.  */
> +      i.rex &= REX_OPCODE;
> +    }
> +  else if (i.rex != 0)
>      add_prefix (REX_OPCODE | i.rex);
>  
>    insert_lfence_before ();
> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
>  		  /* {rex} */
>  		  i.rex_encoding = true;
>  		  break;
> +		case Prefix_REX2:
> +		  /* {rex2} */
> +		  i.rex2_encoding = true;
> +		  break;
>  		case Prefix_NoOptimize:
>  		  /* {nooptimize} */
>  		  i.no_optimize = true;
> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> +  if (!t->opcode_modifier.noegpr)
> +    return 0;
> +
> +  for (unsigned int op = 0; op < i.operands; op++)
> +    {
> +      if (i.types[op].bitfield.class != Reg
> +	  /* Special case for (%dx) while doing input/output op */
> +	  || i.input_output_operand)

Didn't we agree that this extra condition isn't necessary, once the
producer site correctly updates all state (which was supposed to be
done in a small prereq patch)?

> @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
>        /* Do not verify operands when there are none.  */
>        if (!t->operands)
>  	{
> -	  if (VEX_check_encoding (t))
> +	  /* When there are no operands, we still need to use the
> +	     check_EgprOperands function to check whether {rex2} is valid.  */
> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))

As before imo either the function name wants changing (so it becomes
reasonable to use here, without the need for a comment explaining the
oddity), or you simply open-code the sole check that is needed here
(afaict: t->opcode_modifier.noegpr && i.rex2_encoding).

> @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
>  	  continue;
>  	}
>  
> +      /* Check if EGRPS operands(r16-r31) are valid.  */

EGPR?

> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
>  @code{avx10.1/256},
>  @code{avx10.1/128},
>  @code{user_msr},
> +@code{apx_f},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -983,6 +984,9 @@ Different encoding options can be specified via pseudo prefixes:
>  instructions (x86-64 only).  Note that this differs from the @samp{rex}
>  prefix which generates REX prefix unconditionally.
>  
> +@item
> +@samp{@{rex2@}} -- encode with REX2 prefix

This isn't in line with what's said for {rex}. Iirc we were in
agreement that we want both to behave consistently. In which case
documentation also needs to describe them consistently.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +         test	$0x7, %r24b
> +         test	$0x7, %r24d
> +         test	$0x7, %r24
> +         test	$0x7, %r24w
> +## REX2.M bit
> +         imull	%eax, %r15d
> +         imull	%eax, %r16d
> +         punpckldq (%r18), %mm2
> +## REX2.R4 bit
> +         leal	(%rax), %r16d
> +         leal	(%rax), %r17d
> +         leal	(%rax), %r18d
> +         leal	(%rax), %r19d
> +         leal	(%rax), %r20d
> +         leal	(%rax), %r21d
> +         leal	(%rax), %r22d
> +         leal	(%rax), %r23d
> +         leal	(%rax), %r24d
> +         leal	(%rax), %r25d
> +         leal	(%rax), %r26d
> +         leal	(%rax), %r27d
> +         leal	(%rax), %r28d
> +         leal	(%rax), %r29d
> +         leal	(%rax), %r30d
> +         leal	(%rax), %r31d
> +## REX2.X4 bit
> +         leal	(,%r16), %eax
> +         leal	(,%r17), %eax
> +         leal	(,%r18), %eax
> +         leal	(,%r19), %eax
> +         leal	(,%r20), %eax
> +         leal	(,%r21), %eax
> +         leal	(,%r22), %eax
> +         leal	(,%r23), %eax
> +         leal	(,%r24), %eax
> +         leal	(,%r25), %eax
> +         leal	(,%r26), %eax
> +         leal	(,%r27), %eax
> +         leal	(,%r28), %eax
> +         leal	(,%r29), %eax
> +         leal	(,%r30), %eax
> +         leal	(,%r31), %eax
> +## REX.B4 bit

Further up you properly say REX2. Here and below it's only REX?

> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,61 @@ pseudos:
>  	{rex} vmovaps %xmm7,%xmm2
>  	{rex} vmovaps %xmm17,%xmm2
>  	{rex} rorx $7,%eax,%ebx
> +	{rex2} vmovaps %xmm7,%xmm2
> +	{rex2} xsave (%rax)
> +	{rex2} xsaves (%ecx)
> +	{rex2} xsaves64 (%ecx)
> +	{rex2} xsavec (%ecx)
> +	{rex2} xrstors (%ecx)
> +	{rex2} xrstors64 (%ecx)
> +
> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> +	{rex2} mov    0x90909090,%al
> +	{rex2} movabs 0x1,%al
> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> +	{rex2} lodsb
> +	{rex2} lods   %ds:(%esi),%al
> +	{rex2} lodsb   (%esi)
> +	{rex2} movs
> +	{rex2} movs   (%esi), (%edi)
> +	{rex2} scasl
> +	{rex2} scas   %es:(%edi),%eax
> +	{rex2} scasb   (%edi)
> +	{rex2} stosb
> +	{rex2} stosb   (%edi)
> +	{rex2} stos   %eax,%es:(%edi)
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.

This also covers map 1 row 8, doesn't it?

> +	{rex2} jo     .+2-0x70
> +	{rex2} jno    .+2-0x70
> +	{rex2} jb     .+2-0x70
> +	{rex2} jae    .+2-0x70
> +	{rex2} je     .+2-0x70
> +	{rex2} jne    .+2-0x70
> +	{rex2} jbe    .+2-0x70
> +	{rex2} ja     .+2-0x70
> +	{rex2} js     .+2-0x70
> +	{rex2} jns    .+2-0x70
> +	{rex2} jp     .+2-0x70
> +	{rex2} jnp    .+2-0x70
> +	{rex2} jl     .+2-0x70
> +	{rex2} jge    .+2-0x70
> +	{rex2} jle    .+2-0x70
> +	{rex2} jg     .+2-0x70
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> +	{rex2} in $0x90,%al
> +	{rex2} in $0x90
> +	{rex2} out $0x90,%al
> +	{rex2} out $0x90
> +	{rex2} jmp  *%eax
> +	{rex2} loop foo

Isn't this row 0xE?

> +	#All opcodes in the row 0xf3* prefixed REX2 are illegal.

This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps
best to either say "map 1" and omit the "f" or at least write 0x0f3*
or slightly better 0x0f 0x3*.

> --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> @@ -360,6 +360,19 @@ _start:
>  	{rex} movaps (%r8),%xmm2
>  	{rex} phaddw (%rcx),%mm0
>  	{rex} phaddw (%r8),%mm0
> +	{rex2} mov %al,%ah
> +	{rex2} shl %cl, %eax
> +	{rex2} cmp %cl, %dl
> +	{rex2} mov $1, %bl
> +	{rex2} movl %eax,%ebx
> +	{rex2} movl %eax,%r14d
> +	{rex2} movl %eax,(%r8)
> +	{rex2} movaps %xmm7,%xmm2
> +	{rex2} movaps %xmm7,%xmm12
> +	{rex2} movaps (%rcx),%xmm2
> +	{rex2} movaps (%r8),%xmm2
> +	{rex2} pmullw %mm0,%mm6
> +
>  
>  	movb (%rbp),%al
>  	{disp8} movb (%rbp),%al

No double blank lines please.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c

Disassembler comments (if any) in a separate (later) mail again.

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
>      "64" },
>    { "USER_MSR",
>      "64" },
> +  { "APX_F",
> +    "XSAVE|64" },
>  };
>  
>  /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
> @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (FRED),
>    BITFIELD (LKGS),
>    BITFIELD (USER_MSR),
> +  BITFIELD (APX_F),
>    BITFIELD (MWAITX),
>    BITFIELD (CLZERO),
>    BITFIELD (OSPKE),
> @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (ATTSyntax),
>    BITFIELD (IntelSyntax),
>    BITFIELD (ISA64),
> +  BITFIELD (NoEgpr),
>  };
>  
>  #define CLASS(n) #n, n
> @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)
> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0)
> +    return true;

Wasn't this intended to be dropped, being redundant with the opcode table
attributes?

> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> +     are reserved under REX2 and triggers #UD when prefixed with REX2 */
> +  if (space == 0)
> +    switch (opcode >> 4)

Both here and ...

> +      {
> +      case 0x4:
> +      case 0x7:
> +      case 0xA:
> +      case 0xE:
> +	return true;
> +      default:
> +	return false;
> +    }
> +
> +  if (space == SPACE_0F)
> +    switch (opcode >> 4)

... here, don't you also need to mask off further bits? There are
quite a few opcodes which have a kind-of ModR/M byte encoded directly
in the opcode, for example.

> +      {
> +      case 0x3:
> +      case 0x8:
> +	return true;
> +      default:
> +	return false;
> +      }
> +
> +  return false;
> +}
> +
>  static void
>  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  			      unsigned int prefix, const char *extension_opcode,
> -			      char **opnd, int lineno)
> +			      char **opnd, int lineno, bool rex2_disallowed)
>  {
>    char *str, *next, *last;
>    bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
> @@ -1202,6 +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	  || modifiers[SAE].value))
>      modifiers[EVex].value = EVEXDYN;
>  
> +  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
> +     For template supports both Vex and EVex allowing EGPR.  */

"Templates supporting both Vex and EVex allow EGPR."

> +  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> +      && !modifiers[EVex].value)
> +    modifiers[NoEgpr].value = 1;
> +
>    output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
>  }
>  
> @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>  	   ident, 2 * (int)length, opcode, end, i);
>    free (ident);
>  
> +  /* Add some specilal handle for current entry.  */
> +  bool  has_special_handle = rex2_disallowed (opcode, space, cpu_flags);

The local variable (if one is needed in the first place) wants naming as
usefully as the function now is named. Similarly the comment would want
improving alonmg those lines.

>    process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> -				extension_opcode, operand_types, lineno);
> +				extension_opcode, operand_types, lineno,
> +				has_special_handle);
>  
>    process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
>  
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -138,6 +138,7 @@
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.

Stray change (in general please avoid introducing double blank lines, as
those make patch context less useful).

Jan