public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] x86: suffix handling changes
@ 2022-10-05  7:19 Jan Beulich
  2022-10-05  7:20 ` [PATCH v3 1/7] x86: constify parse_insn()'s input Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:19 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

... accompanied by a few other improvements (or so I hope) found
along the way.

1: constify parse_insn()'s input
2: introduce Pass2 insn attribute
3: re-work insn/suffix recognition
4: further re-work insn/suffix recognition to also cover MOVSL
5: don't recognize/derive Q suffix in the common case
6: allow HLE store of accumulator to absolute 32-bit address
7: move bad-use-of-TLS-reloc check

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/7] x86: constify parse_insn()'s input
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
@ 2022-10-05  7:20 ` Jan Beulich
  2022-10-05  7:22 ` [PATCH v3 2/7] x86: introduce Pass2 insn attribute Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:20 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

The function doesn't alter its input buffer: Reflect this in its
prototype. To avoid using any kind of cast, simply calculate the update
of "line" from the function's input and output.
---
Avoiding a cast the way it's done here is going to be useful in
subsequent patches (there'll be code added between the call and the
update of "line"), so switching to a cast is not really an option.
---
v3: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -157,7 +157,7 @@ static int i386_intel_operand (char *, i
 static int i386_intel_simplify (expressionS *);
 static int i386_intel_parse_name (const char *, expressionS *);
 static const reg_entry *parse_register (char *, char **);
-static char *parse_insn (char *, char *);
+static const char *parse_insn (const char *, char *);
 static char *parse_operands (char *, const char *);
 static void swap_operands (void);
 static void swap_2_operands (unsigned int, unsigned int);
@@ -4820,6 +4820,7 @@ md_assemble (char *line)
 {
   unsigned int j;
   char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
+  const char *end;
   const insn_template *t;
 
   /* Initialize globals.  */
@@ -4835,9 +4836,10 @@ md_assemble (char *line)
      We assume that the scrubber has arranged it so that line[0] is the valid
      start of a (possibly prefixed) mnemonic.  */
 
-  line = parse_insn (line, mnemonic);
-  if (line == NULL)
+  end = parse_insn (line, mnemonic);
+  if (end == NULL)
     return;
+  line += end - line;
   mnem_suffix = i.suffix;
 
   line = parse_operands (line, mnemonic);
@@ -5217,11 +5219,10 @@ md_assemble (char *line)
     last_insn.kind = last_insn_other;
 }
 
-static char *
-parse_insn (char *line, char *mnemonic)
+static const char *
+parse_insn (const char *line, char *mnemonic)
 {
-  char *l = line;
-  char *token_start = l;
+  const char *l = line, *token_start = l;
   char *mnem_p;
   int supported;
   const insn_template *t;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 2/7] x86: introduce Pass2 insn attribute
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
  2022-10-05  7:20 ` [PATCH v3 1/7] x86: constify parse_insn()'s input Jan Beulich
@ 2022-10-05  7:22 ` Jan Beulich
  2022-10-05  7:23 ` [PATCH v3 3/7] x86: re-work insn/suffix recognition Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:22 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Subsequently we will want to mark templates for mnemonics which, due to
their last character also possibly being a suffix, are "shadowing" other
templates (and hence may require a second parsing pass). Introduce this
attribute in a standalone patch, largely to ease looking at the actual
later change. To simplify logic the later change is going to look for
the new attribute only on the first template of a group. Make i386-gen
warn about misplaced Pass2 (redundant ones are okay).

While there convert "active_isstring" to bool, matching the new static
variable.
---
As to redundant attributes: An open question is whether - to avoid
surprises when e.g. templates are re-ordered - it wouldn't be desirable
to add Pass2 on all templates of a group.
---
v3: New.

--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -18,6 +18,7 @@
    MA 02110-1301, USA.  */
 
 #include "sysdep.h"
+#include <stdbool.h>
 #include <stdio.h>
 #include <errno.h>
 #include "getopt.h"
@@ -702,6 +703,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (FWait),
   BITFIELD (IsString),
   BITFIELD (RegMem),
+  BITFIELD (Pass2),
   BITFIELD (BNDPrefixOk),
   BITFIELD (RegKludge),
   BITFIELD (Implicit1stXmm0),
@@ -801,7 +803,7 @@ static bitfield operand_types[] =
 
 static const char *filename;
 static i386_cpu_flags active_cpu_flags;
-static int active_isstring;
+static bool active_isstring, active_pass2;
 
 struct template_arg {
   const struct template_arg *next;
@@ -1169,7 +1171,8 @@ process_i386_opcode_modifier (FILE *tabl
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
 
-  active_isstring = 0;
+  active_isstring = false;
+  active_pass2 = false;
 
   /* Copy the default opcode modifier.  */
   memcpy (modifiers, opcode_modifiers, sizeof (modifiers));
@@ -1192,8 +1195,11 @@ process_i386_opcode_modifier (FILE *tabl
 
 	      set_bitfield (str, modifiers, val, ARRAY_SIZE (modifiers),
 			    lineno);
+
 	      if (strcasecmp(str, "IsString") == 0)
-		active_isstring = 1;
+		active_isstring = true;
+	      if (strcasecmp(str, "Pass2") == 0)
+		active_pass2 = true;
 
 	      if (strcasecmp(str, "W") == 0)
 		have_w = 1;
@@ -1868,6 +1874,7 @@ process_i386_opcodes (FILE *table)
   for (j = 0; j < i; j++)
     {
       struct opcode_hash_entry *next;
+      bool first_pass2 = false;
 
       for (next = opcode_array[j]; next; next = next->next)
 	{
@@ -1876,6 +1883,14 @@ process_i386_opcodes (FILE *table)
 	  lineno = next->lineno;
 	  last = str + strlen (str);
 	  output_i386_opcode (table, name, str, last, lineno);
+
+	  if (next == opcode_array[j])
+	    first_pass2 = active_pass2;
+	  else if (!first_pass2 && active_pass2)
+	    fprintf (stderr,
+		     "Warning: %s:%d: %s: "
+		     "'Pass2' is recognized only on the first template in a group\n",
+		     filename, lineno, name);
 	}
     }
 
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -504,6 +504,9 @@ enum
      Normally, it will be encoded in the reg field. We add a RegMem
      flag to indicate that it should be encoded in the regmem field.  */
   RegMem,
+  /* The (unsuffixed) mnemonic represents itself the suffixed form of another
+     mnemonic, potentially requiring a second parsing pass.  */
+  Pass2,
   /* quick test if branch instruction is MPX supported */
   BNDPrefixOk,
   /* fake an extra reg operand for clr, imul and special register
@@ -732,6 +735,7 @@ typedef struct i386_opcode_modifier
   unsigned int fwait:1;
   unsigned int isstring:2;
   unsigned int regmem:1;
+  unsigned int pass2:1;
   unsigned int bndprefixok:1;
   unsigned int regkludge:1;
   unsigned int implicit1stxmm0:1;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
  2022-10-05  7:20 ` [PATCH v3 1/7] x86: constify parse_insn()'s input Jan Beulich
  2022-10-05  7:22 ` [PATCH v3 2/7] x86: introduce Pass2 insn attribute Jan Beulich
@ 2022-10-05  7:23 ` Jan Beulich
  2022-10-05 23:52   ` H.J. Lu
  2022-10-05  7:24 ` [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:23 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

PR gas/29524
Having templates with a suffix explicitly present has always been
quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
a suitable template _and_ didn't itself already need to trim off a
suffix to find a match at all. This requires error reporting adjustments
(albeit luckily fewer than I was afraid might be necessary), as errors
previously reported during matching now need deferring until after the
2nd pass (because, obviously, we must not emit any error if the 2nd pass
succeeds).

PR gas/29525
Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
templates, mixed IsString/non-IsString template groups cannot occur
anymore. With that maybe_adjust_templates() becomes unnecessary (and is
hence being removed).

PR gas/29526
Note further that while the additions to the intel16 testcase aren't
really proper Intel syntax, we've been permitting all of those except
for the MOVD variant. The test therefore is to avoid re-introducing such
an inconsistency.
---
To limit code churn I'm using "goto" for the retry loop, but I'd be
happy to make this a proper loop either right here or in a follow-on
change doing just the necessary re-indentation.

The "too many memory references" errors which are being deleted weren't
fully consistent anyway - even the majority of IsString insns accepts
only a single memory operand. If we wanted to retain that, it would need
re-introducing in md_assemble(), latching the error into i.error just
like match_template() does.

Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
is not?
---
v3: Limit xstrdup() to just the templates where a 2nd pass actually
    makes sense (new Pass2 attribute).

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -297,9 +297,6 @@ struct _i386_insn
        explicit segment overrides are given.  */
     const reg_entry *seg[2];
 
-    /* Copied first memory operand string, for re-checking.  */
-    char *memop1_string;
-
     /* PREFIX holds all the given prefix opcodes (usually null).
        PREFIXES is the number of prefix opcodes.  */
     unsigned int prefixes;
@@ -4273,7 +4270,20 @@ optimize_encoding (void)
 	   movq $imm31, %r64   -> movl $imm31, %r32
 	   movq $imm32, %r64   -> movl $imm32, %r32
         */
-      i.tm.opcode_modifier.norex64 = 1;
+      i.tm.opcode_modifier.size = SIZE32;
+      if (i.imm_operands)
+	{
+	  i.types[0].bitfield.imm32 = 1;
+	  i.types[0].bitfield.imm32s = 0;
+	  i.types[0].bitfield.imm64 = 0;
+	}
+      else
+	{
+	  i.types[0].bitfield.dword = 1;
+	  i.types[0].bitfield.qword = 0;
+	}
+      i.types[1].bitfield.dword = 1;
+      i.types[1].bitfield.qword = 0;
       if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
 	{
 	  /* Handle
@@ -4283,11 +4293,6 @@ optimize_encoding (void)
 	  i.tm.operand_types[0].bitfield.imm32 = 1;
 	  i.tm.operand_types[0].bitfield.imm32s = 0;
 	  i.tm.operand_types[0].bitfield.imm64 = 0;
-	  i.types[0].bitfield.imm32 = 1;
-	  i.types[0].bitfield.imm32s = 0;
-	  i.types[0].bitfield.imm64 = 0;
-	  i.types[1].bitfield.dword = 1;
-	  i.types[1].bitfield.qword = 0;
 	  if ((i.tm.base_opcode | 1) == 0xc7)
 	    {
 	      /* Handle
@@ -4819,11 +4824,14 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
-  const char *end;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy = NULL;
+  const char *end, *pass1_mnem = NULL;
+  enum i386_error pass1_err = 0;
   const insn_template *t;
 
   /* Initialize globals.  */
+  current_templates = NULL;
+ retry:
   memset (&i, '\0', sizeof (i));
   i.rounding.type = rc_none;
   for (j = 0; j < MAX_OPERANDS; j++)
@@ -4838,16 +4846,26 @@ md_assemble (char *line)
 
   end = parse_insn (line, mnemonic);
   if (end == NULL)
-    return;
+    {
+      if (pass1_mnem != NULL)
+	goto match_error;
+      return;
+    }
+  if (current_templates->start->opcode_modifier.pass2)
+    {
+      /* Make a copy of the full line in case we need to retry.  */
+      copy = xstrdup (line);
+    }
   line += end - line;
   mnem_suffix = i.suffix;
 
   line = parse_operands (line, mnemonic);
   this_operand = -1;
-  xfree (i.memop1_string);
-  i.memop1_string = NULL;
   if (line == NULL)
-    return;
+    {
+      free (copy);
+      return;
+    }
 
   /* Now we've parsed the mnemonic into a set of templates, and have the
      operands at hand.  */
@@ -4923,7 +4941,97 @@ md_assemble (char *line)
      with the template operand types.  */
 
   if (!(t = match_template (mnem_suffix)))
-    return;
+    {
+      const char *err_msg;
+
+      if (copy && !mnem_suffix)
+	{
+	  pass1_err = i.error;
+	  pass1_mnem = current_templates->start->name;
+	  line = copy;
+	  copy = NULL;
+	  goto retry;
+	}
+      free (copy);
+  match_error:
+      switch (pass1_mnem ? pass1_err : i.error)
+	{
+	default:
+	  abort ();
+	case operand_size_mismatch:
+	  err_msg = _("operand size mismatch");
+	  break;
+	case operand_type_mismatch:
+	  err_msg = _("operand type mismatch");
+	  break;
+	case register_type_mismatch:
+	  err_msg = _("register type mismatch");
+	  break;
+	case number_of_operands_mismatch:
+	  err_msg = _("number of operands mismatch");
+	  break;
+	case invalid_instruction_suffix:
+	  err_msg = _("invalid instruction suffix");
+	  break;
+	case bad_imm4:
+	  err_msg = _("constant doesn't fit in 4 bits");
+	  break;
+	case unsupported_with_intel_mnemonic:
+	  err_msg = _("unsupported with Intel mnemonic");
+	  break;
+	case unsupported_syntax:
+	  err_msg = _("unsupported syntax");
+	  break;
+	case unsupported:
+	  as_bad (_("unsupported instruction `%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name);
+	  return;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
+	case invalid_vsib_address:
+	  err_msg = _("invalid VSIB address");
+	  break;
+	case invalid_vector_register_set:
+	  err_msg = _("mask, index, and destination registers must be distinct");
+	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("all tmm registers must be distinct");
+	  break;
+	case invalid_dest_and_src_register_set:
+	  err_msg = _("destination and source registers must be distinct");
+	  break;
+	case unsupported_vector_index_register:
+	  err_msg = _("unsupported vector index register");
+	  break;
+	case unsupported_broadcast:
+	  err_msg = _("unsupported broadcast");
+	  break;
+	case broadcast_needed:
+	  err_msg = _("broadcast is needed for operand of such type");
+	  break;
+	case unsupported_masking:
+	  err_msg = _("unsupported masking");
+	  break;
+	case mask_not_on_destination:
+	  err_msg = _("mask not on destination operand");
+	  break;
+	case no_default_mask:
+	  err_msg = _("default mask isn't allowed");
+	  break;
+	case unsupported_rc_sae:
+	  err_msg = _("unsupported static rounding/sae");
+	  break;
+	case invalid_register_operand:
+	  err_msg = _("invalid register operand");
+	  break;
+	}
+      as_bad (_("%s for `%s'"), err_msg,
+	      pass1_mnem ? pass1_mnem : current_templates->start->name);
+      return;
+    }
+
+  free (copy);
 
   if (sse_check != check_none
       /* The opcode space check isn't strictly needed; it's there only to
@@ -5224,6 +5332,7 @@ parse_insn (const char *line, char *mnem
 {
   const char *l = line, *token_start = l;
   char *mnem_p;
+  bool pass1 = !current_templates;
   int supported;
   const insn_template *t;
   char *dot_p = NULL;
@@ -5393,8 +5502,10 @@ parse_insn (const char *line, char *mnem
       current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
     }
 
-  if (!current_templates)
+  if (!current_templates || !pass1)
     {
+      current_templates = NULL;
+
     check_suffix:
       if (mnem_p > mnemonic)
 	{
@@ -5442,7 +5553,8 @@ parse_insn (const char *line, char *mnem
 
       if (!current_templates)
 	{
-	  as_bad (_("no such instruction: `%s'"), token_start);
+	  if (pass1)
+	    as_bad (_("no such instruction: `%s'"), token_start);
 	  return NULL;
 	}
     }
@@ -6852,81 +6964,7 @@ match_template (char mnem_suffix)
   if (t == current_templates->end)
     {
       /* We found no match.  */
-      const char *err_msg;
-      switch (specific_error)
-	{
-	default:
-	  abort ();
-	case operand_size_mismatch:
-	  err_msg = _("operand size mismatch");
-	  break;
-	case operand_type_mismatch:
-	  err_msg = _("operand type mismatch");
-	  break;
-	case register_type_mismatch:
-	  err_msg = _("register type mismatch");
-	  break;
-	case number_of_operands_mismatch:
-	  err_msg = _("number of operands mismatch");
-	  break;
-	case invalid_instruction_suffix:
-	  err_msg = _("invalid instruction suffix");
-	  break;
-	case bad_imm4:
-	  err_msg = _("constant doesn't fit in 4 bits");
-	  break;
-	case unsupported_with_intel_mnemonic:
-	  err_msg = _("unsupported with Intel mnemonic");
-	  break;
-	case unsupported_syntax:
-	  err_msg = _("unsupported syntax");
-	  break;
-	case unsupported:
-	  as_bad (_("unsupported instruction `%s'"),
-		  current_templates->start->name);
-	  return NULL;
-	case invalid_sib_address:
-	  err_msg = _("invalid SIB address");
-	  break;
-	case invalid_vsib_address:
-	  err_msg = _("invalid VSIB address");
-	  break;
-	case invalid_vector_register_set:
-	  err_msg = _("mask, index, and destination registers must be distinct");
-	  break;
-	case invalid_tmm_register_set:
-	  err_msg = _("all tmm registers must be distinct");
-	  break;
-	case invalid_dest_and_src_register_set:
-	  err_msg = _("destination and source registers must be distinct");
-	  break;
-	case unsupported_vector_index_register:
-	  err_msg = _("unsupported vector index register");
-	  break;
-	case unsupported_broadcast:
-	  err_msg = _("unsupported broadcast");
-	  break;
-	case broadcast_needed:
-	  err_msg = _("broadcast is needed for operand of such type");
-	  break;
-	case unsupported_masking:
-	  err_msg = _("unsupported masking");
-	  break;
-	case mask_not_on_destination:
-	  err_msg = _("mask not on destination operand");
-	  break;
-	case no_default_mask:
-	  err_msg = _("default mask isn't allowed");
-	  break;
-	case unsupported_rc_sae:
-	  err_msg = _("unsupported static rounding/sae");
-	  break;
-	case invalid_register_operand:
-	  err_msg = _("invalid register operand");
-	  break;
-	}
-      as_bad (_("%s for `%s'"), err_msg,
-	      current_templates->start->name);
+      i.error = specific_error;
       return NULL;
     }
 
@@ -11335,49 +11373,6 @@ RC_SAE_immediate (const char *imm_start)
   return 1;
 }
 
-/* Only string instructions can have a second memory operand, so
-   reduce current_templates to just those if it contains any.  */
-static int
-maybe_adjust_templates (void)
-{
-  const insn_template *t;
-
-  gas_assert (i.mem_operands == 1);
-
-  for (t = current_templates->start; t < current_templates->end; ++t)
-    if (t->opcode_modifier.isstring)
-      break;
-
-  if (t < current_templates->end)
-    {
-      static templates aux_templates;
-      bool recheck;
-
-      aux_templates.start = t;
-      for (; t < current_templates->end; ++t)
-	if (!t->opcode_modifier.isstring)
-	  break;
-      aux_templates.end = t;
-
-      /* Determine whether to re-check the first memory operand.  */
-      recheck = (aux_templates.start != current_templates->start
-		 || t != current_templates->end);
-
-      current_templates = &aux_templates;
-
-      if (recheck)
-	{
-	  i.mem_operands = 0;
-	  if (i.memop1_string != NULL
-	      && i386_index_check (i.memop1_string) == 0)
-	    return 0;
-	  i.mem_operands = 1;
-	}
-    }
-
-  return 1;
-}
-
 static INLINE bool starts_memory_operand (char c)
 {
   return ISDIGIT (c)
@@ -11528,17 +11523,6 @@ i386_att_operand (char *operand_string)
       char *displacement_string_end;
 
     do_memory_reference:
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((i.mem_operands == 1
-	   && !current_templates->start->opcode_modifier.isstring)
-	  || i.mem_operands == 2)
-	{
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
-	}
-
       /* Check for base index form.  We detect the base index form by
 	 looking for an ')' at the end of the operand, searching
 	 for the '(' matching it, and finding a REGISTER_PREFIX or ','
@@ -11745,8 +11729,6 @@ i386_att_operand (char *operand_string)
       if (i386_index_check (operand_string) == 0)
 	return 0;
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       i.mem_operands++;
     }
   else
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
 	   || intel_state.is_mem)
     {
       /* Memory operand.  */
-      if (i.mem_operands == 1 && !maybe_adjust_templates ())
-	return 0;
-      if ((int) i.mem_operands
-	  >= 2 - !current_templates->start->opcode_modifier.isstring)
+      if (i.mem_operands)
 	{
 	  /* Handle
 
@@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
 		    }
 		}
 	    }
-
-	  as_bad (_("too many memory references for `%s'"),
-		  current_templates->start->name);
-	  return 0;
 	}
 
       /* Swap base and index in 16-bit memory operands like
@@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
 	return 0;
 
       i.flags[this_operand] |= Operand_Mem;
-      if (i.mem_operands == 0)
-	i.memop1_string = xstrdup (operand_string);
       ++i.mem_operands;
     }
   else
--- a/gas/testsuite/gas/i386/code16.s
+++ b/gas/testsuite/gas/i386/code16.s
@@ -1,9 +1,9 @@
 	.text
 	.code16
-	rep; movsd
-	rep; cmpsd
-	rep movsd %ds:(%si),%es:(%di)
-	rep cmpsd %es:(%di),%ds:(%si)
+	rep; movsl
+	rep; cmpsl
+	rep movsl %ds:(%si),%es:(%di)
+	rep cmpsl %es:(%di),%ds:(%si)
 
 	mov	%cr2, %ecx
 	mov	%ecx, %cr2
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -73,6 +73,7 @@ if [gas_32_check] then {
     run_dump_test "amd"
     run_dump_test "katmai"
     run_dump_test "jump"
+    run_dump_test "movs32"
     run_dump_test "movz32"
     run_dump_test "relax-1"
     run_dump_test "relax-2"
@@ -806,6 +807,7 @@ if [gas_64_check] then {
     run_dump_test "x86-64-segovr"
     run_list_test "x86-64-inval-seg" "-al"
     run_dump_test "x86-64-branch"
+    run_dump_test "movs64"
     run_dump_test "movz64"
     run_dump_test "x86-64-relax-1"
     run_dump_test "svme64"
--- a/gas/testsuite/gas/i386/intel16.d
+++ b/gas/testsuite/gas/i386/intel16.d
@@ -20,4 +20,12 @@ Disassembly of section .text:
   2c:	8d 02 [ 	]*lea    \(%bp,%si\),%ax
   2e:	8d 01 [ 	]*lea    \(%bx,%di\),%ax
   30:	8d 03 [ 	]*lea    \(%bp,%di\),%ax
-	...
+[ 	]*[0-9a-f]+:	67 f7 13[ 	]+notw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 f7 17[ 	]+notl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 0f 1f 03[ 	]+nopw[ 	]+\(%ebx\)
+[ 	]*[0-9a-f]+:	66 0f 1f 07[ 	]+nopl[ 	]+\(%bx\)
+[ 	]*[0-9a-f]+:	67 83 03 05[ 	]+addw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 83 07 05[ 	]+addl[ 	]+\$0x5,\(%bx\)
+[ 	]*[0-9a-f]+:	67 c7 03 05 00[ 	]+movw[ 	]+\$0x5,\(%ebx\)
+[ 	]*[0-9a-f]+:	66 c7 07 05 00 00 00[ 	]+movl[ 	]+\$0x5,\(%bx\)
+#pass
--- a/gas/testsuite/gas/i386/intel16.s
+++ b/gas/testsuite/gas/i386/intel16.s
@@ -18,4 +18,14 @@
  lea	ax, [di][bx]
  lea	ax, [di][bp]
 
- .p2align 4,0
+ notw	[ebx]
+ notd	[bx]
+
+ nopw	[ebx]
+ nopd	[bx]
+
+ addw	[ebx], 5
+ addd	[bx], 5
+
+ movw	[ebx], 5
+ movd	[bx], 5
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs.s
@@ -0,0 +1,33 @@
+	.text
+movs:
+	movsb	%al,%ax
+	movsb	(%eax),%ax
+	movsb	%al,%eax
+	movsb	(%eax),%eax
+.ifdef x86_64
+	movsb	%al,%rax
+	movsb	(%rax),%rax
+.endif
+
+	movsbw	%al,%ax
+	movsbw	(%eax),%ax
+	movsbl	%al,%eax
+	movsbl	(%eax),%eax
+.ifdef x86_64
+	movsbq	%al,%rax
+	movsbq	(%rax),%rax
+.endif
+
+	movsw	%ax,%eax
+	movsw	(%eax),%eax
+.ifdef x86_64
+	movsw	%ax,%rax
+	movsw	(%rax),%rax
+.endif
+
+	movswl	%ax,%eax
+	movswl	(%eax),%eax
+.ifdef x86_64
+	movswq	%ax,%rax
+	movswq	(%rax),%rax
+.endif
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs32.d
@@ -0,0 +1,22 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (32-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	0f bf 00 *	movswl \(%eax\),%eax
+#pass
--- /dev/null
+++ b/gas/testsuite/gas/i386/movs64.d
@@ -0,0 +1,30 @@
+#objdump: -dw
+#source: movs.s
+#name: x86 mov with sign-extend (64-bit object)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <movs>:
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	66 0f be c0 *	movsbw %al,%ax
+[ 	]*[a-f0-9]+:	67 66 0f be 00 *	movsbw \(%eax\),%ax
+[ 	]*[a-f0-9]+:	0f be c0 *	movsbl %al,%eax
+[ 	]*[a-f0-9]+:	67 0f be 00 *	movsbl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f be c0 *	movsbq %al,%rax
+[ 	]*[a-f0-9]+:	48 0f be 00 *	movsbq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+[ 	]*[a-f0-9]+:	0f bf c0 *	movswl %ax,%eax
+[ 	]*[a-f0-9]+:	67 0f bf 00 *	movswl \(%eax\),%eax
+[ 	]*[a-f0-9]+:	48 0f bf c0 *	movswq %ax,%rax
+[ 	]*[a-f0-9]+:	48 0f bf 00 *	movswq \(%rax\),%rax
+#pass
--- a/gas/testsuite/gas/i386/movx16.l
+++ b/gas/testsuite/gas/i386/movx16.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx32.l
+++ b/gas/testsuite/gas/i386/movx32.l
@@ -41,11 +41,11 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
@@ -82,7 +82,7 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movswl	%al, %cl
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -106,17 +106,17 @@
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cl
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cl
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %cx
+[ 	]*[1-9][0-9]* \?\?\?\? 660FBEC8[ 	]+movsb	%al, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %cx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBEC8[ 	]+movsb	%al, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
-[ 	]*[1-9][0-9]*[ 	]+movsb	%al, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBEC8[ 	]+movsb	%al, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsb	%rax, %rcx
@@ -192,12 +192,12 @@
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %cx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %ecx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %ecx
+[ 	]*[1-9][0-9]* \?\?\?\? 0FBFC8[ 	]+movsw	%ax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %ecx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %ecx
 [ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movsw	%al, %rcx
-[ 	]*[1-9][0-9]*[ 	]+movsw	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 480FBFC8[ 	]+movsw	%ax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movsw	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -135,47 +135,37 @@
 mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
 mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
 movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
-movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
 mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
 // In the 64bit mode the short form mov immediate is redefined to have
 // 64bit value.
 mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
 mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
 mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
-movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
 // The segment register moves accept WordReg so that a segment register
 // can be copied to a 32 bit register, and vice versa, without using a
 // size prefix.  When moving to a 32 bit register, the upper 16 bits
 // are set to an implementation defined value (on the Pentium Pro, the
 // implementation defined value is zero).
-mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
+mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
 mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
-movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
-mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
+mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
 // Move to/from control debug registers.  In the 16 or 32bit modes
 // they are 32bit.  In the 64bit mode they are 64bit.
 mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
 mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
-movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
 mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
 mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
-movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
 mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
-// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
-// conflict with the "movs" string move instruction.
-movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
-movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
-movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
-movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
-movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
+movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf|Pass2, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|Pass2, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
+// "movslq" must not be converted into "movsl" to avoid conflict with the
+// "movsl" string move instruction.
 movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
@@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
 // String manipulation.
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string compare.
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
@@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
 slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
-// Intel mode string move.
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
-movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
 smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
@@ -989,13 +973,13 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
 // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
 // spec). AMD's spec, having been in existence for much longer, failed to
 // recognize that and specified movd for 32- and 64-bit operations.
-movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
+movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
 movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
 movd, 0xf6e, None, CpuMMX, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegMMX }
 movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegMMX }
-movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
 movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
 movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
@@ -1159,7 +1143,7 @@ andpd<sse2>, 0x660f54, None, <sse2:cpu>,
 cmp<frel>pd<sse2>, 0x660fc2, <frel:imm>, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|<frel:comm>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|Unspecified|BaseIndex, RegXMM }
 cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|<frel:comm>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
 cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
-cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
+cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Pass2, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
 cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
@@ -1184,7 +1168,7 @@ movlpd, 0x6613, None, CpuAVX, Modrm|Vex|
 movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
 movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
 movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
-movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
+movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Qword|Unspecified|BaseIndex, RegXMM }
 movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
 movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
 movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
                   ` (2 preceding siblings ...)
  2022-10-05  7:23 ` [PATCH v3 3/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-10-05  7:24 ` Jan Beulich
  2022-10-11 17:44   ` H.J. Lu
  2022-10-05  7:24 ` [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:24 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

PR gas/29524
In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
to prefix parsing. Utilize i.error just like match_template() does.
---
v3: Re-base over changes to earlier patches (incl use of Pass2).

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -236,6 +236,8 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    unsupported_on_arch,
+    unsupported_64bit,
     invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
@@ -4849,6 +4851,15 @@ md_assemble (char *line)
     {
       if (pass1_mnem != NULL)
 	goto match_error;
+      if (i.error != no_error)
+	{
+	  gas_assert (current_templates != NULL);
+	  if (current_templates->start->opcode_modifier.pass2 && !i.suffix)
+	    goto no_match;
+	  /* No point in trying a 2nd pass - it'll only find the same suffix
+	     again.  */
+	  goto match_error;
+	}
       return;
     }
   if (current_templates->start->opcode_modifier.pass2)
@@ -4948,12 +4959,21 @@ md_assemble (char *line)
 	{
 	  line = copy;
 	  copy = NULL;
+  no_match:
 	  pass1_err = i.error;
 	  pass1_mnem = current_templates->start->name;
 	  goto retry;
 	}
-      free (copy);
+
+      /* If a non-/only-64bit template (group) was found in pass 1, and if
+	 _some_ template (group) was found in pass 2, squash pass 1's
+	 error.  */
+      if (pass1_err == unsupported_64bit)
+	pass1_mnem = NULL;
+
   match_error:
+      free (copy);
+
       switch (pass1_mnem ? pass1_err : i.error)
 	{
 	default:
@@ -4986,6 +5006,17 @@ md_assemble (char *line)
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : current_templates->start->name);
 	  return;
+	case unsupported_on_arch:
+	  as_bad (_("`%s' is not supported on `%s%s'"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  cpu_arch_name ? cpu_arch_name : default_arch,
+		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
+	  return;
+	case unsupported_64bit:
+	  as_bad (_("`%s' is %s supported in 64-bit mode"),
+		  pass1_mnem ? pass1_mnem : current_templates->start->name,
+		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
 	  break;
@@ -5601,16 +5632,13 @@ parse_insn (const char *line, char *mnem
 	return l;
     }
 
-  if (!(supported & CPU_FLAGS_64BIT_MATCH))
-    as_bad (flag_code == CODE_64BIT
-	    ? _("`%s' is not supported in 64-bit mode")
-	    : _("`%s' is only supported in 64-bit mode"),
-	    current_templates->start->name);
-  else
-    as_bad (_("`%s' is not supported on `%s%s'"),
-	    current_templates->start->name,
-	    cpu_arch_name ? cpu_arch_name : default_arch,
-	    cpu_sub_arch_name ? cpu_sub_arch_name : "");
+  if (pass1)
+    {
+      if (supported & CPU_FLAGS_64BIT_MATCH)
+        i.error = unsupported_on_arch;
+      else
+        i.error = unsupported_64bit;
+    }
 
   return NULL;
 }
--- a/gas/testsuite/gas/i386/movs.s
+++ b/gas/testsuite/gas/i386/movs.s
@@ -30,4 +30,10 @@ movs:
 .ifdef x86_64
 	movswq	%ax,%rax
 	movswq	(%rax),%rax
+
+	movsl	%eax,%rax
+	movsl	(%rax),%rax
+
+	movslq	%eax,%rax
+	movslq	(%rax),%rax
 .endif
--- a/gas/testsuite/gas/i386/movx64.l
+++ b/gas/testsuite/gas/i386/movx64.l
@@ -241,6 +241,46 @@
 [ 	]*[1-9][0-9]*[ 	]+movswq	%eax, %rcx
 [ 	]*[1-9][0-9]*[ 	]+movswq	%rax, %rcx
 [ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movsl	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movsl	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movsl	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cl
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cl
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %cx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %cx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%eax, %ecx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %ecx
+[ 	]*[1-9][0-9]*[ 	]*
+[ 	]*[1-9][0-9]*[ 	]+movslq	%al, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%ax, %rcx
+[ 	]*[1-9][0-9]* \?\?\?\? 4863C8[ 	]+movslq	%eax, %rcx
+[ 	]*[1-9][0-9]*[ 	]+movslq	%rax, %rcx
+[ 	]*[1-9][0-9]*[ 	]*
 [ 	]*[1-9][0-9]*[ 	]+movzx:
 [ 	]*[1-9][0-9]*[ 	]+movzx	%al, %cl
 [ 	]*[1-9][0-9]*[ 	]+movzx	%ax, %cl
--- a/gas/testsuite/gas/i386/movx64.s
+++ b/gas/testsuite/gas/i386/movx64.s
@@ -241,6 +241,46 @@ movsx:
 	movswq	%eax, %rcx
 	movswq	%rax, %rcx
 
+	movsl	%al, %cl
+	movsl	%ax, %cl
+	movsl	%eax, %cl
+	movsl	%rax, %cl
+
+	movsl	%al, %cx
+	movsl	%ax, %cx
+	movsl	%eax, %cx
+	movsl	%rax, %cx
+
+	movsl	%al, %ecx
+	movsl	%ax, %ecx
+	movsl	%eax, %ecx
+	movsl	%rax, %ecx
+
+	movsl	%al, %rcx
+	movsl	%ax, %rcx
+	movsl	%eax, %rcx
+	movsl	%rax, %rcx
+
+	movslq	%al, %cl
+	movslq	%ax, %cl
+	movslq	%eax, %cl
+	movslq	%rax, %cl
+
+	movslq	%al, %cx
+	movslq	%ax, %cx
+	movslq	%eax, %cx
+	movslq	%rax, %cx
+
+	movslq	%al, %ecx
+	movslq	%ax, %ecx
+	movslq	%eax, %ecx
+	movslq	%rax, %ecx
+
+	movslq	%al, %rcx
+	movslq	%ax, %rcx
+	movslq	%eax, %rcx
+	movslq	%rax, %rcx
+
 movzx:
 	movzx	%al, %cl
 	movzx	%ax, %cl
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -164,9 +164,7 @@ movbe, 0x0f38f0, None, CpuMovbe, D|Modrm
 // Move with sign extend.
 movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf|Pass2, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|Pass2, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
-// "movslq" must not be converted into "movsl" to avoid conflict with the
-// "movsl" string move instruction.
-movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
+movsl, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Pass2, { Reg32|Unspecified|BaseIndex, Reg64 }
 movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
 movsxd, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
                   ` (3 preceding siblings ...)
  2022-10-05  7:24 ` [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
@ 2022-10-05  7:24 ` Jan Beulich
  2022-10-11 17:49   ` H.J. Lu
  2022-10-05  7:25 ` [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
  2022-10-05  7:25 ` [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
  6 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:24 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Have its use, except where actually legitimate, result in the same "only
supported in 64-bit mode" diagnostic as emitted for other 64-bit only
insns. Also suppress deriving of the suffix in Intel mode except in the
legitimate cases. This in exchange allows dropping the respective code
from match_template().

Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
had no testcase whatsoever for these. Therefore such tests are being
added. Note that the removed line in the x86-64-lfence-load testcase
was redundant with the exact same one a few lines up.
---
With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
disassembler wouldn't better emit a Q suffix instead of the LL one.
---
v3: Re-base over changes to earlier patches.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -4826,7 +4826,7 @@ void
 md_assemble (char *line)
 {
   unsigned int j;
-  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy = NULL;
+  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy = NULL;
   const char *end, *pass1_mnem = NULL;
   enum i386_error pass1_err = 0;
   const insn_template *t;
@@ -4858,6 +4858,7 @@ md_assemble (char *line)
 	    goto no_match;
 	  /* No point in trying a 2nd pass - it'll only find the same suffix
 	     again.  */
+	  mnem_suffix = i.suffix;
 	  goto match_error;
 	}
       return;
@@ -5013,9 +5014,15 @@ md_assemble (char *line)
 		  cpu_sub_arch_name ? cpu_sub_arch_name : "");
 	  return;
 	case unsupported_64bit:
-	  as_bad (_("`%s' is %s supported in 64-bit mode"),
-		  pass1_mnem ? pass1_mnem : current_templates->start->name,
-		  flag_code == CODE_64BIT ? _("not") : _("only"));
+	  if (ISLOWER (mnem_suffix))
+	    as_bad (_("`%s%c' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    mnem_suffix,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
+	  else
+	    as_bad (_("`%s' is %s supported in 64-bit mode"),
+		    pass1_mnem ? pass1_mnem : current_templates->start->name,
+		    flag_code == CODE_64BIT ? _("not") : _("only"));
 	  return;
 	case invalid_sib_address:
 	  err_msg = _("invalid SIB address");
@@ -5358,6 +5365,23 @@ md_assemble (char *line)
     last_insn.kind = last_insn_other;
 }
 
+/* The Q suffix is generally valid only in 64-bit mode, with very few
+   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
+   and fisttp only one of their two templates is matched below: That's
+   sufficient since other relevant attributes are the same between both
+   respective templates.  */
+static INLINE bool q_suffix_allowed(const insn_template *t)
+{
+  return flag_code == CODE_64BIT
+	 || (t->opcode_modifier.opcodespace == SPACE_BASE
+	     && t->base_opcode == 0xdf
+	     && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
+	 || (t->opcode_modifier.opcodespace == SPACE_0F
+	     && t->base_opcode == 0xc7
+	     && t->opcode_modifier.opcodeprefix == PREFIX_NONE
+	     && t->extension_opcode == 1) /* cmpxchg8b */;
+}
+
 static const char *
 parse_insn (const char *line, char *mnemonic)
 {
@@ -5628,6 +5652,10 @@ parse_insn (const char *line, char *mnem
   for (t = current_templates->start; t < current_templates->end; ++t)
     {
       supported |= cpu_flags_match (t);
+
+      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
+	supported &= ~CPU_FLAGS_64BIT_MATCH;
+
       if (supported == CPU_FLAGS_PERFECT_MATCH)
 	return l;
     }
@@ -6663,20 +6691,12 @@ match_template (char mnem_suffix)
       for (j = 0; j < MAX_OPERANDS; j++)
 	operand_types[j] = t->operand_types[j];
 
-      /* In general, don't allow
-	 - 64-bit operands outside of 64-bit mode,
-	 - 32-bit operands on pre-386.  */
+      /* In general, don't allow 32-bit operands on pre-386.  */
       specific_error = progress (mnem_suffix ? invalid_instruction_suffix
 					     : operand_size_mismatch);
       j = i.imm_operands + (t->operands > i.imm_operands + 1);
-      if (((i.suffix == QWORD_MNEM_SUFFIX
-	    && flag_code != CODE_64BIT
-	    && !(t->opcode_modifier.opcodespace == SPACE_0F
-		 && t->base_opcode == 0xc7
-		 && t->opcode_modifier.opcodeprefix == PREFIX_NONE
-		 && t->extension_opcode == 1) /* cmpxchg8b */)
-	   || (i.suffix == LONG_MNEM_SUFFIX
-	       && !cpu_arch_flags.bitfield.cpui386))
+      if (i.suffix == LONG_MNEM_SUFFIX
+	  && !cpu_arch_flags.bitfield.cpui386
 	  && (intel_syntax
 	      ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
 		 && !intel_float_operand (t->name))
--- a/gas/config/tc-i386-intel.c
+++ b/gas/config/tc-i386-intel.c
@@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
 		    continue;
 		  break;
 		case QWORD_MNEM_SUFFIX:
-		  if (t->opcode_modifier.no_qsuf)
+		  if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
 		    continue;
 		  break;
 		case SHORT_MNEM_SUFFIX:
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -592,6 +592,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	add    \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xor    \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adc    \$0x1,%bl
--- a/gas/testsuite/gas/i386/opcode.s
+++ b/gas/testsuite/gas/i386/opcode.s
@@ -592,6 +592,11 @@ foo:
  cmovpe  0x90909090(%eax),%dx
  cmovpo 0x90909090(%eax),%dx
 
+ fildq  (%eax)
+ fildll (%eax)
+ fistpq (%eax)
+ fistpll (%eax)
+
 	.byte 0x82, 0xc3, 0x01
 	.byte 0x82, 0xf3, 0x01
 	.byte 0x82, 0xd3, 0x01
--- a/gas/testsuite/gas/i386/opcode-intel.d
+++ b/gas/testsuite/gas/i386/opcode-intel.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 28                	fild   QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
+[ 	]*[a-f0-9]+:	df 38                	fistp  QWORD PTR \[eax\]
  +[a-f0-9]+:	82 c3 01             	add    bl,0x1
  +[a-f0-9]+:	82 f3 01             	xor    bl,0x1
  +[a-f0-9]+:	82 d3 01             	adc    bl,0x1
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -593,6 +593,10 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	0f 4b 90 90 90 90 90 	cmovnpl -0x6f6f6f70\(%eax\),%edx
 [ 	]*[a-f0-9]+:	66 0f 4a 90 90 90 90 90 	cmovpw -0x6f6f6f70\(%eax\),%dx
 [ 	]*[a-f0-9]+:	66 0f 4b 90 90 90 90 90 	cmovnpw -0x6f6f6f70\(%eax\),%dx
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 28                	fildll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
+[ 	]*[a-f0-9]+:	df 38                	fistpll \(%eax\)
  +[a-f0-9]+:	82 c3 01             	addb   \$0x1,%bl
  +[a-f0-9]+:	82 f3 01             	xorb   \$0x1,%bl
  +[a-f0-9]+:	82 d3 01             	adcb   \$0x1,%bl
--- a/gas/testsuite/gas/i386/sse3.d
+++ b/gas/testsuite/gas/i386/sse3.d
@@ -13,29 +13,30 @@ Disassembly of section .text:
   10:	df 88 90 90 90 90 [ 	]*fisttps -0x6f6f6f70\(%eax\)
   16:	db 88 90 90 90 90 [ 	]*fisttpl -0x6f6f6f70\(%eax\)
   1c:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
-  22:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
-  27:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
-  2b:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
-  2f:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
-  33:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
-  37:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
-  3b:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
-  3f:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
-  44:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
-  48:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4b:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
-  4e:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
-  52:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  56:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
-  5a:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
-  5e:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
-  62:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
-  66:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  69:	0f 01 c9 [ 	]*mwait  %eax,%ecx
-  6c:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  70:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
-  74:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
-  78:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90 [ 	]*fisttpll -0x6f6f6f70\(%eax\)
+[ 	]*[0-9a-f]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%ebp\),%xmm4
+[ 	]*[0-9a-f]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
+[ 	]*[0-9a-f]+:	f2 0f 7c 37 [ 	]*haddps \(%edi\),%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 7c f8 [ 	]*haddps %xmm0,%xmm7
+[ 	]*[0-9a-f]+:	66 0f 7d c1 [ 	]*hsubpd %xmm1,%xmm0
+[ 	]*[0-9a-f]+:	66 0f 7d 0a [ 	]*hsubpd \(%edx\),%xmm1
+[ 	]*[0-9a-f]+:	f2 0f 7d d2 [ 	]*hsubps %xmm2,%xmm2
+[ 	]*[0-9a-f]+:	f2 0f 7d 1c 24 [ 	]*hsubps \(%esp\),%xmm3
+[ 	]*[0-9a-f]+:	f2 0f f0 2e [ 	]*lddqu  \(%esi\),%xmm5
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	0f 01 c8 [ 	]*monitor %eax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 f7 [ 	]*movddup %xmm7,%xmm6
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f3 0f 16 01 [ 	]*movshdup \(%ecx\),%xmm0
+[ 	]*[0-9a-f]+:	f3 0f 16 ca [ 	]*movshdup %xmm2,%xmm1
+[ 	]*[0-9a-f]+:	f3 0f 12 13 [ 	]*movsldup \(%ebx\),%xmm2
+[ 	]*[0-9a-f]+:	f3 0f 12 dc [ 	]*movsldup %xmm4,%xmm3
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	0f 01 c9 [ 	]*mwait  %eax,%ecx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	67 0f 01 c8 [ 	]*monitor %ax,%ecx,%edx
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
+[ 	]*[0-9a-f]+:	f2 0f 12 38 [ 	]*movddup \(%eax\),%xmm7
 [ 	]*[0-9a-f]+:	0f 01 c8[ 	]+monitor %eax,%ecx,%edx
 [ 	]*[0-9a-f]+:	67 0f 01 c8[ 	]+monitor %ax,%ecx,%edx
 [ 	]*[0-9a-f]+:	0f 01 c9[ 	]+mwait  %eax,%ecx
--- a/gas/testsuite/gas/i386/sse3.s
+++ b/gas/testsuite/gas/i386/sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x90909090(%eax)
 	fisttpl		0x90909090(%eax)
+	fisttpq		0x90909090(%eax)
 	fisttpll	0x90909090(%eax)
 	haddpd		0x0(%ebp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/sse3-intel.d
+++ b/gas/testsuite/gas/i386/sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[0-9a-f]+:	df 88 90 90 90 90[ 	]+fisttp WORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	db 88 90 90 90 90[ 	]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
+[ 	]*[0-9a-f]+:	dd 88 90 90 90 90[ 	]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
 [ 	]*[0-9a-f]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
 [ 	]*[0-9a-f]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[0-9a-f]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[edi\]
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -44,16 +44,21 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	db 55 00             	fistl  0x0\(%rbp\)
  +[a-f0-9]+:	df 55 00             	fists  0x0\(%rbp\)
+ +[a-f0-9]+:	db 5d 00             	fistpl 0x0\(%rbp\)
+ +[a-f0-9]+:	df 5d 00             	fistps 0x0\(%rbp\)
+ +[a-f0-9]+:	df 7d 00             	fistpll 0x0\(%rbp\)
  +[a-f0-9]+:	db 45 00             	fildl  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	df 6d 00             	fildll 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%rbp\)
  +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
- +[a-f0-9]+:	0f ae e8             	lfence
+ +[a-f0-9]+:	db 4d 00             	fisttpl 0x0\(%rbp\)
  +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%rbp\)
+ +[a-f0-9]+:	dd 4d 00             	fisttpll 0x0\(%rbp\)
  +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%rbp\)
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -27,12 +27,17 @@ _start:
 	flds (%rbp)
 	fistl (%rbp)
 	fists (%rbp)
+	fistpl (%rbp)
+	fistps (%rbp)
+	fistpq (%rbp)
 	fildl (%rbp)
 	filds (%rbp)
+	fildq (%rbp)
 	fsave (%rbp)
 	frstor (%rbp)
-	filds (%rbp)
+	fisttpl (%rbp)
 	fisttps (%rbp)
+	fisttpq (%rbp)
 	fldenv (%rbp)
 	fstenv (%rbp)
 	fadds  (%rbp)
--- a/gas/testsuite/gas/i386/x86-64-sse3.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3.d
@@ -13,6 +13,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00 [ 	]*fisttps 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00 [ 	]*fisttpl 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00 [ 	]*fisttpll 0x909090\(%rax\)
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00 [ 	]*haddpd 0x0\(%rbp\),%xmm4
 [ 	]*[a-f0-9]+:	66 0f 7c ee [ 	]*haddpd %xmm6,%xmm5
 [ 	]*[a-f0-9]+:	f2 0f 7c 37 [ 	]*haddps \(%rdi\),%xmm6
--- a/gas/testsuite/gas/i386/x86-64-sse3.s
+++ b/gas/testsuite/gas/i386/x86-64-sse3.s
@@ -8,6 +8,7 @@ foo:
 	addsubps	%xmm4,%xmm3
 	fisttps		0x909090(%rax)
 	fisttpl		0x909090(%rax)
+	fisttpq		0x909090(%rax)
 	fisttpll	0x909090(%rax)
 	haddpd		0x0(%rbp),%xmm4
 	haddpd		%xmm6,%xmm5
--- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
@@ -14,6 +14,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	df 88 90 90 90 00[ 	]+fisttp WORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	db 88 90 90 90 00[ 	]+fisttp DWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
+[ 	]*[a-f0-9]+:	dd 88 90 90 90 00[ 	]+fisttp QWORD PTR \[rax\+0x909090\]
 [ 	]*[a-f0-9]+:	66 0f 7c 65 00[ 	]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
 [ 	]*[a-f0-9]+:	66 0f 7c ee[ 	]+haddpd xmm5,xmm6
 [ 	]*[a-f0-9]+:	f2 0f 7c 37[ 	]+haddps xmm6,(XMMWORD PTR )?\[rdi\]


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
                   ` (4 preceding siblings ...)
  2022-10-05  7:24 ` [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
@ 2022-10-05  7:25 ` Jan Beulich
  2022-10-11 17:50   ` H.J. Lu
  2022-10-05  7:25 ` [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
  6 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:25 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

In commit 1212781b35c9 ("ix86: allow HLE store of accumulator to
absolute address") I was wrong to exclude 64-bit code. Dropping the
check also leads to better diagnostics in 64-bit code ("MOV", after
all, isn't invalid with "XRELEASE").

While there also limit the amount of further checks done: The operand
type checks that were there were effectively redundant with other ones
anyway, plus it's quite fine to also have "xrelease mov <disp>, %eax"
look for the next MOV template (in fact again also improving
diagnostics).
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -6819,12 +6819,9 @@ match_template (char mnem_suffix)
 	    continue;
 	  /* xrelease mov %eax, <disp> is another special case. It must not
 	     match the accumulator-only encoding of mov.  */
-	  if (flag_code != CODE_64BIT
-	      && i.hle_prefix
+	  if (i.hle_prefix
 	      && t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE
-	      && i.types[0].bitfield.instance == Accum
-	      && (i.flags[1] & Operand_Mem))
+	      && t->opcode_modifier.opcodespace == SPACE_BASE)
 	    continue;
 	  /* Fall through.  */
 
--- a/gas/testsuite/gas/i386/x86-64-hle-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-hle-intel.d
@@ -425,6 +425,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov BYTE PTR \[rcx\],al
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov BYTE PTR (ds:)?0x12345678,al
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov BYTE PTR \[eiz\*1\+0x87654321\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or BYTE PTR \[rcx\],al
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or BYTE PTR \[rcx\],al
@@ -476,6 +478,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov WORD PTR \[rcx\],ax
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov WORD PTR (ds:)?0x12345678,ax
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov WORD PTR \[eiz\*1\+0x87654321\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or WORD PTR \[rcx\],ax
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or WORD PTR \[rcx\],ax
@@ -527,6 +531,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov DWORD PTR \[rcx\],eax
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov DWORD PTR (ds:)?0x12345678,eax
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov DWORD PTR \[eiz\*1\+0x87654321\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or DWORD PTR \[rcx\],eax
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or DWORD PTR \[rcx\],eax
@@ -578,6 +584,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov QWORD PTR \[rcx\],rax
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov QWORD PTR (ds:)?0x12345678,rax
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov QWORD PTR \[eiz\*1\+0x87654321\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or QWORD PTR \[rcx\],rax
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or QWORD PTR \[rcx\],rax
--- a/gas/testsuite/gas/i386/x86-64-hle.d
+++ b/gas/testsuite/gas/i386/x86-64-hle.d
@@ -424,6 +424,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 20 01          	lock xacquire and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 20 01          	lock xrelease and %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 88 01             	xrelease mov %al,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 88 04 25 78 56 34 12 	xrelease mov %al,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 88 04 25 21 43 65 87 	xrelease mov %al,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 08 01          	xacquire lock or %al,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 08 01          	xrelease lock or %al,\(%rcx\)
@@ -475,6 +477,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 66 21 01       	lock xacquire and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 66 21 01       	lock xrelease and %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 89 01          	xrelease mov %ax,\(%rcx\)
+[ 	]*[a-f0-9]+:	66 f3 89 04 25 78 56 34 12 	xrelease mov %ax,0x12345678
+[ 	]*[a-f0-9]+:	67 66 f3 89 04 25 21 43 65 87 	xrelease mov %ax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f2 f0 09 01       	xacquire lock or %ax,\(%rcx\)
 [ 	]*[a-f0-9]+:	66 f3 f0 09 01       	xrelease lock or %ax,\(%rcx\)
@@ -526,6 +530,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 21 01          	lock xacquire and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 21 01          	lock xrelease and %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 89 01             	xrelease mov %eax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 89 04 25 78 56 34 12 	xrelease mov %eax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 89 04 25 21 43 65 87 	xrelease mov %eax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 09 01          	xacquire lock or %eax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 09 01          	xrelease lock or %eax,\(%rcx\)
@@ -577,6 +583,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	f0 f2 48 21 01       	lock xacquire and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f0 f3 48 21 01       	lock xrelease and %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 48 89 01          	xrelease mov %rax,\(%rcx\)
+[ 	]*[a-f0-9]+:	f3 48 89 04 25 78 56 34 12 	xrelease mov %rax,0x12345678
+[ 	]*[a-f0-9]+:	67 f3 48 89 04 25 21 43 65 87 	xrelease mov %rax,0x87654321\(,%eiz,1\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f2 f0 48 09 01       	xacquire lock or %rax,\(%rcx\)
 [ 	]*[a-f0-9]+:	f3 f0 48 09 01       	xrelease lock or %rax,\(%rcx\)
--- a/gas/testsuite/gas/i386/x86-64-hle.s
+++ b/gas/testsuite/gas/i386/x86-64-hle.s
@@ -442,6 +442,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andb %al,(%rcx)
 	.byte 0xf0; .byte 0xf3; andb %al,(%rcx)
 	xrelease movb %al,(%rcx)
+	xrelease movb %al,0x12345678
+	xrelease addr32 movb %al,0x87654321
 	xacquire lock orb %al,(%rcx)
 	lock xacquire orb %al,(%rcx)
 	xrelease lock orb %al,(%rcx)
@@ -496,6 +498,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andw %ax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andw %ax,(%rcx)
 	xrelease movw %ax,(%rcx)
+	xrelease movw %ax,0x12345678
+	xrelease addr32 movw %ax,0x87654321
 	xacquire lock orw %ax,(%rcx)
 	lock xacquire orw %ax,(%rcx)
 	xrelease lock orw %ax,(%rcx)
@@ -550,6 +554,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andl %eax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andl %eax,(%rcx)
 	xrelease movl %eax,(%rcx)
+	xrelease movl %eax,0x12345678
+	xrelease addr32 movl %eax,0x87654321
 	xacquire lock orl %eax,(%rcx)
 	lock xacquire orl %eax,(%rcx)
 	xrelease lock orl %eax,(%rcx)
@@ -604,6 +610,8 @@ _start:
 	.byte 0xf0; .byte 0xf2; andq %rax,(%rcx)
 	.byte 0xf0; .byte 0xf3; andq %rax,(%rcx)
 	xrelease movq %rax,(%rcx)
+	xrelease movq %rax,0x12345678
+	xrelease addr32 movq %rax,0x87654321
 	xacquire lock orq %rax,(%rcx)
 	lock xacquire orq %rax,(%rcx)
 	xrelease lock orq %rax,(%rcx)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check
  2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
                   ` (5 preceding siblings ...)
  2022-10-05  7:25 ` [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
@ 2022-10-05  7:25 ` Jan Beulich
  2022-10-11 17:57   ` H.J. Lu
  6 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-05  7:25 UTC (permalink / raw)
  To: Binutils; +Cc: H.J. Lu

Having it in match_template() is unhelpful. Neither does looking for the
next template to possibly match make any sense in that case, nor is the
resulting diagnostic making clear what the problem is.

While moving the check, also generalize it to include all SIMD and VEX-
encoded insns. This way an existing conditional can be re-used in
md_assemble(). Note though that this still leaves a lof of insns which
are also wrong to use with these relocations.

Further fold the remaining check (BFD_RELOC_386_GOT32) with the XRELEASE
related one a few lines down. This again allows re-using an existing
conditional.
---
Is the set of relocations enumerated here actually complete?
---
v2: New.

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -5119,14 +5119,30 @@ md_assemble (char *line)
       return;
     }
 
-  /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
-  if (i.prefix[DATA_PREFIX]
-      && (is_any_vex_encoding (&i.tm)
-	  || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
-	  || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX))
+  if (is_any_vex_encoding (&i.tm)
+      || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
+      || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX)
     {
-      as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
-      return;
+      /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
+      if (i.prefix[DATA_PREFIX])
+	{
+	  as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
+	  return;
+	}
+
+      /* Don't allow e.g. KMOV in TLS code sequences.  */
+      for (j = i.imm_operands; j < i.operands; ++j)
+	switch (i.reloc[j])
+	  {
+	  case BFD_RELOC_386_TLS_GOTIE:
+	  case BFD_RELOC_386_TLS_LE_32:
+	  case BFD_RELOC_X86_64_GOTTPOFF:
+	  case BFD_RELOC_X86_64_TLSLD:
+	    as_bad (_("TLS relocation cannot be used with `%s'"), i.tm.name);
+	    return;
+	  default:
+	    break;
+	  }
     }
 
   /* Check if HLE prefix is OK.  */
@@ -6767,26 +6783,6 @@ match_template (char mnem_suffix)
 	    }
 	}
 
-      switch (i.reloc[0])
-	{
-	case BFD_RELOC_386_GOT32:
-	  /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
-	  if (t->base_opcode == 0xa0
-	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
-	  break;
-	case BFD_RELOC_386_TLS_GOTIE:
-	case BFD_RELOC_386_TLS_LE_32:
-	case BFD_RELOC_X86_64_GOTTPOFF:
-	case BFD_RELOC_X86_64_TLSLD:
-	  /* Don't allow KMOV in TLS code sequences.  */
-	  if (t->opcode_modifier.vex)
-	    continue;
-	  break;
-	default:
-	  break;
-	}
-
       /* We check register size if needed.  */
       if (t->opcode_modifier.checkregsize)
 	{
@@ -6817,12 +6813,19 @@ match_template (char mnem_suffix)
 	      && i.types[1].bitfield.instance == Accum
 	      && i.types[1].bitfield.dword)
 	    continue;
-	  /* xrelease mov %eax, <disp> is another special case. It must not
-	     match the accumulator-only encoding of mov.  */
-	  if (i.hle_prefix
-	      && t->base_opcode == 0xa0
+
+	  if (t->base_opcode == MOV_AX_DISP32
 	      && t->opcode_modifier.opcodespace == SPACE_BASE)
-	    continue;
+	    {
+	      /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
+	      if (i.reloc[0] == BFD_RELOC_386_GOT32)
+		continue;
+
+	      /* xrelease mov %eax, <disp> is another special case. It must not
+		 match the accumulator-only encoding of mov.  */
+	      if (i.hle_prefix)
+		continue;
+	    }
 	  /* Fall through.  */
 
 	case 3:


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-05  7:23 ` [PATCH v3 3/7] x86: re-work insn/suffix recognition Jan Beulich
@ 2022-10-05 23:52   ` H.J. Lu
  2022-10-06  6:15     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-05 23:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 12:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> PR gas/29524
> Having templates with a suffix explicitly present has always been
> quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
> a suitable template _and_ didn't itself already need to trim off a
> suffix to find a match at all. This requires error reporting adjustments
> (albeit luckily fewer than I was afraid might be necessary), as errors
> previously reported during matching now need deferring until after the
> 2nd pass (because, obviously, we must not emit any error if the 2nd pass
> succeeds).
>
> PR gas/29525
> Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
> templates, mixed IsString/non-IsString template groups cannot occur
> anymore. With that maybe_adjust_templates() becomes unnecessary (and is
> hence being removed).
>
> PR gas/29526
> Note further that while the additions to the intel16 testcase aren't
> really proper Intel syntax, we've been permitting all of those except
> for the MOVD variant. The test therefore is to avoid re-introducing such
> an inconsistency.
> ---
> To limit code churn I'm using "goto" for the retry loop, but I'd be
> happy to make this a proper loop either right here or in a follow-on
> change doing just the necessary re-indentation.
>
> The "too many memory references" errors which are being deleted weren't
> fully consistent anyway - even the majority of IsString insns accepts
> only a single memory operand. If we wanted to retain that, it would need
> re-introducing in md_assemble(), latching the error into i.error just
> like match_template() does.
>
> Why is "MOVQ $imm64, %reg64" being optimized but "MOVABS $imm64, %reg64"
> is not?
> ---
> v3: Limit xstrdup() to just the templates where a 2nd pass actually
>     makes sense (new Pass2 attribute).
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -297,9 +297,6 @@ struct _i386_insn
>         explicit segment overrides are given.  */
>      const reg_entry *seg[2];
>
> -    /* Copied first memory operand string, for re-checking.  */
> -    char *memop1_string;
> -
>      /* PREFIX holds all the given prefix opcodes (usually null).
>         PREFIXES is the number of prefix opcodes.  */
>      unsigned int prefixes;
> @@ -4273,7 +4270,20 @@ optimize_encoding (void)
>            movq $imm31, %r64   -> movl $imm31, %r32
>            movq $imm32, %r64   -> movl $imm32, %r32
>          */
> -      i.tm.opcode_modifier.norex64 = 1;
> +      i.tm.opcode_modifier.size = SIZE32;
> +      if (i.imm_operands)
> +       {
> +         i.types[0].bitfield.imm32 = 1;
> +         i.types[0].bitfield.imm32s = 0;
> +         i.types[0].bitfield.imm64 = 0;
> +       }
> +      else
> +       {
> +         i.types[0].bitfield.dword = 1;
> +         i.types[0].bitfield.qword = 0;
> +       }
> +      i.types[1].bitfield.dword = 1;
> +      i.types[1].bitfield.qword = 0;
>        if (i.tm.base_opcode == 0xb8 || (i.tm.base_opcode | 1) == 0xc7)
>         {
>           /* Handle
> @@ -4283,11 +4293,6 @@ optimize_encoding (void)
>           i.tm.operand_types[0].bitfield.imm32 = 1;
>           i.tm.operand_types[0].bitfield.imm32s = 0;
>           i.tm.operand_types[0].bitfield.imm64 = 0;
> -         i.types[0].bitfield.imm32 = 1;
> -         i.types[0].bitfield.imm32s = 0;
> -         i.types[0].bitfield.imm64 = 0;
> -         i.types[1].bitfield.dword = 1;
> -         i.types[1].bitfield.qword = 0;
>           if ((i.tm.base_opcode | 1) == 0xc7)
>             {
>               /* Handle
> @@ -4819,11 +4824,14 @@ void
>  md_assemble (char *line)
>  {
>    unsigned int j;
> -  char mnemonic[MAX_MNEM_SIZE], mnem_suffix;
> -  const char *end;
> +  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy = NULL;
> +  const char *end, *pass1_mnem = NULL;
> +  enum i386_error pass1_err = 0;
>    const insn_template *t;
>
>    /* Initialize globals.  */
> +  current_templates = NULL;
> + retry:
>    memset (&i, '\0', sizeof (i));
>    i.rounding.type = rc_none;
>    for (j = 0; j < MAX_OPERANDS; j++)
> @@ -4838,16 +4846,26 @@ md_assemble (char *line)
>
>    end = parse_insn (line, mnemonic);
>    if (end == NULL)
> -    return;
> +    {
> +      if (pass1_mnem != NULL)
> +       goto match_error;
> +      return;
> +    }
> +  if (current_templates->start->opcode_modifier.pass2)
> +    {
> +      /* Make a copy of the full line in case we need to retry.  */
> +      copy = xstrdup (line);
> +    }
>    line += end - line;
>    mnem_suffix = i.suffix;
>
>    line = parse_operands (line, mnemonic);
>    this_operand = -1;
> -  xfree (i.memop1_string);
> -  i.memop1_string = NULL;
>    if (line == NULL)
> -    return;
> +    {
> +      free (copy);
> +      return;
> +    }
>
>    /* Now we've parsed the mnemonic into a set of templates, and have the
>       operands at hand.  */
> @@ -4923,7 +4941,97 @@ md_assemble (char *line)
>       with the template operand types.  */
>
>    if (!(t = match_template (mnem_suffix)))
> -    return;
> +    {
> +      const char *err_msg;
> +
> +      if (copy && !mnem_suffix)
> +       {
> +         pass1_err = i.error;
> +         pass1_mnem = current_templates->start->name;
> +         line = copy;
> +         copy = NULL;
> +         goto retry;
> +       }
> +      free (copy);
> +  match_error:
> +      switch (pass1_mnem ? pass1_err : i.error)
> +       {
> +       default:
> +         abort ();
> +       case operand_size_mismatch:
> +         err_msg = _("operand size mismatch");
> +         break;
> +       case operand_type_mismatch:
> +         err_msg = _("operand type mismatch");
> +         break;
> +       case register_type_mismatch:
> +         err_msg = _("register type mismatch");
> +         break;
> +       case number_of_operands_mismatch:
> +         err_msg = _("number of operands mismatch");
> +         break;
> +       case invalid_instruction_suffix:
> +         err_msg = _("invalid instruction suffix");
> +         break;
> +       case bad_imm4:
> +         err_msg = _("constant doesn't fit in 4 bits");
> +         break;
> +       case unsupported_with_intel_mnemonic:
> +         err_msg = _("unsupported with Intel mnemonic");
> +         break;
> +       case unsupported_syntax:
> +         err_msg = _("unsupported syntax");
> +         break;
> +       case unsupported:
> +         as_bad (_("unsupported instruction `%s'"),
> +                 pass1_mnem ? pass1_mnem : current_templates->start->name);
> +         return;
> +       case invalid_sib_address:
> +         err_msg = _("invalid SIB address");
> +         break;
> +       case invalid_vsib_address:
> +         err_msg = _("invalid VSIB address");
> +         break;
> +       case invalid_vector_register_set:
> +         err_msg = _("mask, index, and destination registers must be distinct");
> +         break;
> +       case invalid_tmm_register_set:
> +         err_msg = _("all tmm registers must be distinct");
> +         break;
> +       case invalid_dest_and_src_register_set:
> +         err_msg = _("destination and source registers must be distinct");
> +         break;
> +       case unsupported_vector_index_register:
> +         err_msg = _("unsupported vector index register");
> +         break;
> +       case unsupported_broadcast:
> +         err_msg = _("unsupported broadcast");
> +         break;
> +       case broadcast_needed:
> +         err_msg = _("broadcast is needed for operand of such type");
> +         break;
> +       case unsupported_masking:
> +         err_msg = _("unsupported masking");
> +         break;
> +       case mask_not_on_destination:
> +         err_msg = _("mask not on destination operand");
> +         break;
> +       case no_default_mask:
> +         err_msg = _("default mask isn't allowed");
> +         break;
> +       case unsupported_rc_sae:
> +         err_msg = _("unsupported static rounding/sae");
> +         break;
> +       case invalid_register_operand:
> +         err_msg = _("invalid register operand");
> +         break;
> +       }
> +      as_bad (_("%s for `%s'"), err_msg,
> +             pass1_mnem ? pass1_mnem : current_templates->start->name);
> +      return;
> +    }
> +
> +  free (copy);
>
>    if (sse_check != check_none
>        /* The opcode space check isn't strictly needed; it's there only to
> @@ -5224,6 +5332,7 @@ parse_insn (const char *line, char *mnem
>  {
>    const char *l = line, *token_start = l;
>    char *mnem_p;
> +  bool pass1 = !current_templates;
>    int supported;
>    const insn_template *t;
>    char *dot_p = NULL;
> @@ -5393,8 +5502,10 @@ parse_insn (const char *line, char *mnem
>        current_templates = (const templates *) str_hash_find (op_hash, mnemonic);
>      }
>
> -  if (!current_templates)
> +  if (!current_templates || !pass1)
>      {
> +      current_templates = NULL;
> +
>      check_suffix:
>        if (mnem_p > mnemonic)
>         {
> @@ -5442,7 +5553,8 @@ parse_insn (const char *line, char *mnem
>
>        if (!current_templates)
>         {
> -         as_bad (_("no such instruction: `%s'"), token_start);
> +         if (pass1)
> +           as_bad (_("no such instruction: `%s'"), token_start);
>           return NULL;
>         }
>      }
> @@ -6852,81 +6964,7 @@ match_template (char mnem_suffix)
>    if (t == current_templates->end)
>      {
>        /* We found no match.  */
> -      const char *err_msg;
> -      switch (specific_error)
> -       {
> -       default:
> -         abort ();
> -       case operand_size_mismatch:
> -         err_msg = _("operand size mismatch");
> -         break;
> -       case operand_type_mismatch:
> -         err_msg = _("operand type mismatch");
> -         break;
> -       case register_type_mismatch:
> -         err_msg = _("register type mismatch");
> -         break;
> -       case number_of_operands_mismatch:
> -         err_msg = _("number of operands mismatch");
> -         break;
> -       case invalid_instruction_suffix:
> -         err_msg = _("invalid instruction suffix");
> -         break;
> -       case bad_imm4:
> -         err_msg = _("constant doesn't fit in 4 bits");
> -         break;
> -       case unsupported_with_intel_mnemonic:
> -         err_msg = _("unsupported with Intel mnemonic");
> -         break;
> -       case unsupported_syntax:
> -         err_msg = _("unsupported syntax");
> -         break;
> -       case unsupported:
> -         as_bad (_("unsupported instruction `%s'"),
> -                 current_templates->start->name);
> -         return NULL;
> -       case invalid_sib_address:
> -         err_msg = _("invalid SIB address");
> -         break;
> -       case invalid_vsib_address:
> -         err_msg = _("invalid VSIB address");
> -         break;
> -       case invalid_vector_register_set:
> -         err_msg = _("mask, index, and destination registers must be distinct");
> -         break;
> -       case invalid_tmm_register_set:
> -         err_msg = _("all tmm registers must be distinct");
> -         break;
> -       case invalid_dest_and_src_register_set:
> -         err_msg = _("destination and source registers must be distinct");
> -         break;
> -       case unsupported_vector_index_register:
> -         err_msg = _("unsupported vector index register");
> -         break;
> -       case unsupported_broadcast:
> -         err_msg = _("unsupported broadcast");
> -         break;
> -       case broadcast_needed:
> -         err_msg = _("broadcast is needed for operand of such type");
> -         break;
> -       case unsupported_masking:
> -         err_msg = _("unsupported masking");
> -         break;
> -       case mask_not_on_destination:
> -         err_msg = _("mask not on destination operand");
> -         break;
> -       case no_default_mask:
> -         err_msg = _("default mask isn't allowed");
> -         break;
> -       case unsupported_rc_sae:
> -         err_msg = _("unsupported static rounding/sae");
> -         break;
> -       case invalid_register_operand:
> -         err_msg = _("invalid register operand");
> -         break;
> -       }
> -      as_bad (_("%s for `%s'"), err_msg,
> -             current_templates->start->name);
> +      i.error = specific_error;
>        return NULL;
>      }
>
> @@ -11335,49 +11373,6 @@ RC_SAE_immediate (const char *imm_start)
>    return 1;
>  }
>
> -/* Only string instructions can have a second memory operand, so
> -   reduce current_templates to just those if it contains any.  */
> -static int
> -maybe_adjust_templates (void)
> -{
> -  const insn_template *t;
> -
> -  gas_assert (i.mem_operands == 1);
> -
> -  for (t = current_templates->start; t < current_templates->end; ++t)
> -    if (t->opcode_modifier.isstring)
> -      break;
> -
> -  if (t < current_templates->end)
> -    {
> -      static templates aux_templates;
> -      bool recheck;
> -
> -      aux_templates.start = t;
> -      for (; t < current_templates->end; ++t)
> -       if (!t->opcode_modifier.isstring)
> -         break;
> -      aux_templates.end = t;
> -
> -      /* Determine whether to re-check the first memory operand.  */
> -      recheck = (aux_templates.start != current_templates->start
> -                || t != current_templates->end);
> -
> -      current_templates = &aux_templates;
> -
> -      if (recheck)
> -       {
> -         i.mem_operands = 0;
> -         if (i.memop1_string != NULL
> -             && i386_index_check (i.memop1_string) == 0)
> -           return 0;
> -         i.mem_operands = 1;
> -       }
> -    }
> -
> -  return 1;
> -}
> -
>  static INLINE bool starts_memory_operand (char c)
>  {
>    return ISDIGIT (c)
> @@ -11528,17 +11523,6 @@ i386_att_operand (char *operand_string)
>        char *displacement_string_end;
>
>      do_memory_reference:
> -      if (i.mem_operands == 1 && !maybe_adjust_templates ())
> -       return 0;
> -      if ((i.mem_operands == 1
> -          && !current_templates->start->opcode_modifier.isstring)
> -         || i.mem_operands == 2)
> -       {
> -         as_bad (_("too many memory references for `%s'"),
> -                 current_templates->start->name);
> -         return 0;
> -       }
> -
>        /* Check for base index form.  We detect the base index form by
>          looking for an ')' at the end of the operand, searching
>          for the '(' matching it, and finding a REGISTER_PREFIX or ','
> @@ -11745,8 +11729,6 @@ i386_att_operand (char *operand_string)
>        if (i386_index_check (operand_string) == 0)
>         return 0;
>        i.flags[this_operand] |= Operand_Mem;
> -      if (i.mem_operands == 0)
> -       i.memop1_string = xstrdup (operand_string);
>        i.mem_operands++;
>      }
>    else
> --- a/gas/config/tc-i386-intel.c
> +++ b/gas/config/tc-i386-intel.c
> @@ -993,10 +993,7 @@ i386_intel_operand (char *operand_string
>            || intel_state.is_mem)
>      {
>        /* Memory operand.  */
> -      if (i.mem_operands == 1 && !maybe_adjust_templates ())
> -       return 0;
> -      if ((int) i.mem_operands
> -         >= 2 - !current_templates->start->opcode_modifier.isstring)
> +      if (i.mem_operands)
>         {
>           /* Handle
>
> @@ -1041,10 +1038,6 @@ i386_intel_operand (char *operand_string
>                     }
>                 }
>             }
> -
> -         as_bad (_("too many memory references for `%s'"),
> -                 current_templates->start->name);
> -         return 0;
>         }
>
>        /* Swap base and index in 16-bit memory operands like
> @@ -1158,8 +1151,6 @@ i386_intel_operand (char *operand_string
>         return 0;
>
>        i.flags[this_operand] |= Operand_Mem;
> -      if (i.mem_operands == 0)
> -       i.memop1_string = xstrdup (operand_string);
>        ++i.mem_operands;
>      }
>    else
> --- a/gas/testsuite/gas/i386/code16.s
> +++ b/gas/testsuite/gas/i386/code16.s
> @@ -1,9 +1,9 @@
>         .text
>         .code16
> -       rep; movsd
> -       rep; cmpsd
> -       rep movsd %ds:(%si),%es:(%di)
> -       rep cmpsd %es:(%di),%ds:(%si)
> +       rep; movsl
> +       rep; cmpsl
> +       rep movsl %ds:(%si),%es:(%di)
> +       rep cmpsl %es:(%di),%ds:(%si)
>
>         mov     %cr2, %ecx
>         mov     %ecx, %cr2
> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -73,6 +73,7 @@ if [gas_32_check] then {
>      run_dump_test "amd"
>      run_dump_test "katmai"
>      run_dump_test "jump"
> +    run_dump_test "movs32"
>      run_dump_test "movz32"
>      run_dump_test "relax-1"
>      run_dump_test "relax-2"
> @@ -806,6 +807,7 @@ if [gas_64_check] then {
>      run_dump_test "x86-64-segovr"
>      run_list_test "x86-64-inval-seg" "-al"
>      run_dump_test "x86-64-branch"
> +    run_dump_test "movs64"
>      run_dump_test "movz64"
>      run_dump_test "x86-64-relax-1"
>      run_dump_test "svme64"
> --- a/gas/testsuite/gas/i386/intel16.d
> +++ b/gas/testsuite/gas/i386/intel16.d
> @@ -20,4 +20,12 @@ Disassembly of section .text:
>    2c:  8d 02 [         ]*lea    \(%bp,%si\),%ax
>    2e:  8d 01 [         ]*lea    \(%bx,%di\),%ax
>    30:  8d 03 [         ]*lea    \(%bp,%di\),%ax
> -       ...
> +[      ]*[0-9a-f]+:    67 f7 13[       ]+notw[         ]+\(%ebx\)
> +[      ]*[0-9a-f]+:    66 f7 17[       ]+notl[         ]+\(%bx\)
> +[      ]*[0-9a-f]+:    67 0f 1f 03[    ]+nopw[         ]+\(%ebx\)
> +[      ]*[0-9a-f]+:    66 0f 1f 07[    ]+nopl[         ]+\(%bx\)
> +[      ]*[0-9a-f]+:    67 83 03 05[    ]+addw[         ]+\$0x5,\(%ebx\)
> +[      ]*[0-9a-f]+:    66 83 07 05[    ]+addl[         ]+\$0x5,\(%bx\)
> +[      ]*[0-9a-f]+:    67 c7 03 05 00[         ]+movw[         ]+\$0x5,\(%ebx\)
> +[      ]*[0-9a-f]+:    66 c7 07 05 00 00 00[   ]+movl[         ]+\$0x5,\(%bx\)
> +#pass
> --- a/gas/testsuite/gas/i386/intel16.s
> +++ b/gas/testsuite/gas/i386/intel16.s
> @@ -18,4 +18,14 @@
>   lea   ax, [di][bx]
>   lea   ax, [di][bp]
>
> - .p2align 4,0
> + notw  [ebx]
> + notd  [bx]
> +
> + nopw  [ebx]
> + nopd  [bx]
> +
> + addw  [ebx], 5
> + addd  [bx], 5
> +
> + movw  [ebx], 5
> + movd  [bx], 5
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs.s
> @@ -0,0 +1,33 @@
> +       .text
> +movs:
> +       movsb   %al,%ax
> +       movsb   (%eax),%ax
> +       movsb   %al,%eax
> +       movsb   (%eax),%eax
> +.ifdef x86_64
> +       movsb   %al,%rax
> +       movsb   (%rax),%rax
> +.endif
> +
> +       movsbw  %al,%ax
> +       movsbw  (%eax),%ax
> +       movsbl  %al,%eax
> +       movsbl  (%eax),%eax
> +.ifdef x86_64
> +       movsbq  %al,%rax
> +       movsbq  (%rax),%rax
> +.endif
> +
> +       movsw   %ax,%eax
> +       movsw   (%eax),%eax
> +.ifdef x86_64
> +       movsw   %ax,%rax
> +       movsw   (%rax),%rax
> +.endif
> +
> +       movswl  %ax,%eax
> +       movswl  (%eax),%eax
> +.ifdef x86_64
> +       movswq  %ax,%rax
> +       movswq  (%rax),%rax
> +.endif
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs32.d
> @@ -0,0 +1,22 @@
> +#objdump: -dw
> +#source: movs.s
> +#name: x86 mov with sign-extend (32-bit object)
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+ <movs>:
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    66 0f be 00 *   movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    0f be 00 *      movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    66 0f be 00 *   movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    0f be 00 *      movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    0f bf 00 *      movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    0f bf 00 *      movswl \(%eax\),%eax
> +#pass
> --- /dev/null
> +++ b/gas/testsuite/gas/i386/movs64.d
> @@ -0,0 +1,30 @@
> +#objdump: -dw
> +#source: movs.s
> +#name: x86 mov with sign-extend (64-bit object)
> +
> +.*: +file format .*
> +
> +Disassembly of section .text:
> +
> +0+ <movs>:
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    67 66 0f be 00 *        movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    67 0f be 00 *   movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f be c0 *   movsbq %al,%rax
> +[      ]*[a-f0-9]+:    48 0f be 00 *   movsbq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    66 0f be c0 *   movsbw %al,%ax
> +[      ]*[a-f0-9]+:    67 66 0f be 00 *        movsbw \(%eax\),%ax
> +[      ]*[a-f0-9]+:    0f be c0 *      movsbl %al,%eax
> +[      ]*[a-f0-9]+:    67 0f be 00 *   movsbl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f be c0 *   movsbq %al,%rax
> +[      ]*[a-f0-9]+:    48 0f be 00 *   movsbq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    67 0f bf 00 *   movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f bf c0 *   movswq %ax,%rax
> +[      ]*[a-f0-9]+:    48 0f bf 00 *   movswq \(%rax\),%rax
> +[      ]*[a-f0-9]+:    0f bf c0 *      movswl %ax,%eax
> +[      ]*[a-f0-9]+:    67 0f bf 00 *   movswl \(%eax\),%eax
> +[      ]*[a-f0-9]+:    48 0f bf c0 *   movswq %ax,%rax
> +[      ]*[a-f0-9]+:    48 0f bf 00 *   movswq \(%rax\),%rax
> +#pass
> --- a/gas/testsuite/gas/i386/movx16.l
> +++ b/gas/testsuite/gas/i386/movx16.l
> @@ -41,11 +41,11 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> @@ -82,7 +82,7 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBFC8[        ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movswl        %al, %cl
> --- a/gas/testsuite/gas/i386/movx32.l
> +++ b/gas/testsuite/gas/i386/movx32.l
> @@ -41,11 +41,11 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> @@ -82,7 +82,7 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBFC8[  ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movswl        %al, %cl
> --- a/gas/testsuite/gas/i386/movx64.l
> +++ b/gas/testsuite/gas/i386/movx64.l
> @@ -106,17 +106,17 @@
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cl
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %cl
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %cx
> +[      ]*[1-9][0-9]* \?\?\?\? 660FBEC8[        ]+movsb %al, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %cx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %cx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBEC8[  ]+movsb %al, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
> -[      ]*[1-9][0-9]*[  ]+movsb %al, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 480FBEC8[        ]+movsb %al, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %ax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %eax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsb %rax, %rcx
> @@ -192,12 +192,12 @@
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %cx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %ecx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %ecx
> +[      ]*[1-9][0-9]* \?\?\?\? 0FBFC8[  ]+movsw %ax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %ecx
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %ecx
>  [      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movsw %al, %rcx
> -[      ]*[1-9][0-9]*[  ]+movsw %ax, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 480FBFC8[        ]+movsw %ax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsw %eax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movsw %rax, %rcx
>  [      ]*[1-9][0-9]*[  ]*
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -135,47 +135,37 @@
>  mov, 0xa0, None, CpuNo64, D|W|No_sSuf|No_qSuf|No_ldSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword }
>  mov, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
>  movabs, 0xa0, None, Cpu64, D|W|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
> -movq, 0xa1, None, Cpu64, D|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Disp64|Unspecified|Qword, Acc|Qword }
>  mov, 0x88, None, 0, D|W|CheckRegSize|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -movq, 0x89, None, Cpu64, D|Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease, { Reg64, Reg64|Unspecified|Qword|BaseIndex }
>  // In the 64bit mode the short form mov immediate is redefined to have
>  // 64bit value.
>  mov, 0xb0, None, 0, W|No_sSuf|No_qSuf|No_ldSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32 }
>  mov, 0xc6, 0, 0, W|Modrm|No_sSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -movq, 0xc7, 0, Cpu64, Modrm|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|HLEPrefixRelease|Optimize, { Imm32S, Reg64|Qword|Unspecified|BaseIndex }
>  mov, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
>  movabs, 0xb8, None, Cpu64, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf, { Imm64, Reg64 }
> -movq, 0xb8, None, Cpu64, Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { Imm64, Reg64 }
>  // The segment register moves accept WordReg so that a segment register
>  // can be copied to a 32 bit register, and vice versa, without using a
>  // size prefix.  When moving to a 32 bit register, the upper 16 bits
>  // are set to an implementation defined value (on the Pentium Pro, the
>  // implementation defined value is zero).
> -mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
> +mov, 0x8c, None, 0, RegMem|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { SReg, Reg16|Reg32|Reg64 }
>  mov, 0x8c, None, 0, D|Modrm|IgnoreSize|No_bSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { SReg, Word|Unspecified|BaseIndex }
> -movq, 0x8c, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { SReg, Reg64 }
> -mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
> +mov, 0x8e, None, 0, Modrm|IgnoreSize|No_bSuf|No_sSuf|No_ldSuf|NoRex64, { Reg16|Reg32|Reg64, SReg }
>  // Move to/from control debug registers.  In the 16 or 32bit modes
>  // they are 32bit.  In the 64bit mode they are 64bit.
>  mov, 0xf20, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Control, Reg32 }
>  mov, 0xf20, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Control, Reg64 }
> -movq, 0xf20, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Control, Reg64 }
>  mov, 0xf21, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Debug, Reg32 }
>  mov, 0xf21, None, Cpu64, D|RegMem|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
> -movq, 0xf21, None, Cpu64, D|RegMem|Size64|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Debug, Reg64 }
>  mov, 0xf24, None, Cpu386|CpuNo64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Test, Reg32 }
>
>  // Move after swapping the bytes
>  movbe, 0x0f38f0, None, CpuMovbe, D|Modrm|No_bSuf|No_sSuf|No_ldSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>
>  // Move with sign extend.
> -// "movsbl" & "movsbw" must not be unified into "movsb" to avoid
> -// conflict with the "movs" string move instruction.
> -movsbl, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg32 }
> -movsbw, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Byte|Unspecified|BaseIndex, Reg16 }
> -movswl, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg16|Word|Unspecified|BaseIndex, Reg32 }
> -movsbq, 0xfbe, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg8|Byte|Unspecified|BaseIndex, Reg64 }
> -movswq, 0xfbf, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg16|Word|Unspecified|BaseIndex, Reg64 }
> +movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf|Pass2, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> +movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|Pass2, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
> +// "movslq" must not be converted into "movsl" to avoid conflict with the
> +// "movsl" string move instruction.
>  movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
>  movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
> @@ -492,9 +482,6 @@ set<cc>, 0xf9<cc:opc>, 0, Cpu386, Modrm|
>  // String manipulation.
>  cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  cmps, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -// Intel mode string compare.
> -cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> -cmpsd, 0xa7, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
>  scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  scmp, 0xa6, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp0|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  ins, 0x6c, None, Cpu186, W|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> @@ -509,9 +496,6 @@ slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|
>  slod, 0xac, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
>  movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  movs, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> -// Intel mode string move.
> -movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsString|RepPrefixOk, {}
> -movsd, 0xa5, None, Cpu386, Size32|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Dword|Unspecified|BaseIndex, Dword|Unspecified|BaseIndex }
>  smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
>  smov, 0xa4, None, 0, W|No_sSuf|No_ldSuf|IsStringEsOp1|RepPrefixOk, { Byte|Word|Dword|Qword|Unspecified|BaseIndex, Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  scas, 0xae, None, 0, W|No_sSuf|No_ldSuf|IsString|RepPrefixOk, {}
> @@ -989,13 +973,13 @@ emms, 0xf77, None, CpuMMX, No_bSuf|No_wS
>  // copying between Reg64/Mem64 and RegXMM/RegMMX, as is mandated by Intel's
>  // spec). AMD's spec, having been in existence for much longer, failed to
>  // recognize that and specified movd for 32- and 64-bit operations.
> -movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Reg32|Unspecified|BaseIndex, RegXMM }
> +movd, 0x666e, None, CpuAVX, D|Modrm|Vex128|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Reg32|Unspecified|BaseIndex, RegXMM }
>  movd, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|BaseIndex, RegXMM }
>  movd, 0x660f6e, None, CpuSSE2, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegXMM }
>  movd, 0x660f6e, None, CpuSSE2|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegXMM }
>  movd, 0xf6e, None, CpuMMX, D|Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, RegMMX }
>  movd, 0xf6e, None, CpuMMX|Cpu64, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg64|BaseIndex, RegMMX }
> -movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +movq, 0xf37e, None, CpuAVX, Load|Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movq, 0x66d6, None, CpuAVX, Modrm|Vex=1|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM }
>  movq, 0x666e, None, CpuAVX|Cpu64, D|Modrm|Vex=1|Space0F|VexW1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64|SSE2AVX, { Reg64|Unspecified|BaseIndex, RegXMM }
>  movq, 0xf30f7e, None, CpuSSE2, Load|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|Qword|BaseIndex|RegXMM, RegXMM }
> @@ -1159,7 +1143,7 @@ andpd<sse2>, 0x660f54, None, <sse2:cpu>,
>  cmp<frel>pd<sse2>, 0x660fc2, <frel:imm>, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|<frel:comm>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  cmp<frel>sd<sse2>, 0xf20fc2, <frel:imm>, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|<frel:comm>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM }
>  cmppd<sse2>, 0x660fc2, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
> -cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
> +cmpsd<sse2>, 0xf20fc2, None, <sse2:cpu>, Modrm|<sse2:scal>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Pass2, { Imm8, Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  comisd<sse2>, 0x660f2f, None, <sse2:cpu>, Modrm|<sse2:scal>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  cvtpi2pd, 0x660f2a, None, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegMMX, RegXMM }
>  cvtpi2pd, 0xf3e6, None, CpuAVX, Modrm|Vex|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> @@ -1184,7 +1168,7 @@ movlpd, 0x6613, None, CpuAVX, Modrm|Vex|
>  movlpd, 0x660f12, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex, RegXMM }
>  movmskpd<sse2>, 0x660f50, None, <sse2:cpu>, Modrm|<sse2:attr>|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|NoRex64, { RegXMM, Reg32|Reg64 }
>  movntpd<sse2>, 0x660f2b, None, <sse2:cpu>, Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, Xmmword|Unspecified|BaseIndex }
> -movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { Qword|Unspecified|BaseIndex, RegXMM }
> +movsd, 0xf210, None, CpuAVX, D|Modrm|VexLIG|Space0F|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX|Pass2, { Qword|Unspecified|BaseIndex, RegXMM }
>  movsd, 0xf210, None, CpuAVX, D|Modrm|Vex=3|Space0F|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|SSE2AVX, { RegXMM, RegXMM }
>  movsd, 0xf20f10, None, CpuSSE2, D|Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM }
>  movupd<sse2>, 0x660f10, None, <sse2:cpu>, D|Modrm|<sse2:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>

Does the new assembler work on Linux kernel which has "rep movsd"?

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-05 23:52   ` H.J. Lu
@ 2022-10-06  6:15     ` Jan Beulich
  2022-10-06  6:58       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-06  6:15 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 06.10.2022 01:52, H.J. Lu wrote:
> Does the new assembler work on Linux kernel which has "rep movsd"?

No. And it shouldn't, as they should never have used MOVSD. The only valid
mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
continue to support MOVSD in AT&T mode, then this will - once again for
consistency - need extending to _all_ other D-suffixable insns the SDM
specifies. I can only repeat what I've said before: Consistency is a
requirement such that users can predict assembler behavior.

In the meantime I guess I'll make a patch for Linux.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-06  6:15     ` Jan Beulich
@ 2022-10-06  6:58       ` Jan Beulich
  2022-10-06 15:28         ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-06  6:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 06.10.2022 08:15, Jan Beulich via Binutils wrote:
> On 06.10.2022 01:52, H.J. Lu wrote:
>> Does the new assembler work on Linux kernel which has "rep movsd"?
> 
> No. And it shouldn't, as they should never have used MOVSD. The only valid
> mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
> continue to support MOVSD in AT&T mode, then this will - once again for
> consistency - need extending to _all_ other D-suffixable insns the SDM
> specifies. I can only repeat what I've said before: Consistency is a
> requirement such that users can predict assembler behavior.

Note how Clang's integrated assembler doesn't even support CMPSD as a
string instruction - that's imo yet more odd behavior, and likely
attributed _solely_ to the goal of wanting to work around code wrongly
using such.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-06  6:58       ` Jan Beulich
@ 2022-10-06 15:28         ` H.J. Lu
  2022-10-06 16:12           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-06 15:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 11:58 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.10.2022 08:15, Jan Beulich via Binutils wrote:
> > On 06.10.2022 01:52, H.J. Lu wrote:
> >> Does the new assembler work on Linux kernel which has "rep movsd"?
> >
> > No. And it shouldn't, as they should never have used MOVSD. The only valid
> > mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
> > continue to support MOVSD in AT&T mode, then this will - once again for
> > consistency - need extending to _all_ other D-suffixable insns the SDM
> > specifies. I can only repeat what I've said before: Consistency is a
> > requirement such that users can predict assembler behavior.
>
> Note how Clang's integrated assembler doesn't even support CMPSD as a
> string instruction - that's imo yet more odd behavior, and likely
> attributed _solely_ to the goal of wanting to work around code wrongly
> using such.

I think we should avoid changing assembly sources if possible.  Should we keep
CMPSD/MOVSD without any operands? This won't cause any confusion.


-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-06 15:28         ` H.J. Lu
@ 2022-10-06 16:12           ` Jan Beulich
  2022-10-06 18:41             ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-06 16:12 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 06.10.2022 17:28, H.J. Lu wrote:
> On Wed, Oct 5, 2022 at 11:58 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 06.10.2022 08:15, Jan Beulich via Binutils wrote:
>>> On 06.10.2022 01:52, H.J. Lu wrote:
>>>> Does the new assembler work on Linux kernel which has "rep movsd"?
>>>
>>> No. And it shouldn't, as they should never have used MOVSD. The only valid
>>> mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
>>> continue to support MOVSD in AT&T mode, then this will - once again for
>>> consistency - need extending to _all_ other D-suffixable insns the SDM
>>> specifies. I can only repeat what I've said before: Consistency is a
>>> requirement such that users can predict assembler behavior.
>>
>> Note how Clang's integrated assembler doesn't even support CMPSD as a
>> string instruction - that's imo yet more odd behavior, and likely
>> attributed _solely_ to the goal of wanting to work around code wrongly
>> using such.
> 
> I think we should avoid changing assembly sources if possible.  Should we keep
> CMPSD/MOVSD without any operands? This won't cause any confusion.

Since Clang doesn't support CMPSD, I'd be (hesitantly) okay with keeping
just the single MOVSD template having no operands. I'm still be inclined
to warn if it ends up being used, so that people can correct their code.
If you can explain why you think CMPSD also needs retaining in a similar
way, I might be talked into keeping the operand-less form there as well.
But anything going beyond that would have me fall back to requiring
consistency throughout the mnemonics a D suffix might be used with as
per vendor documentation.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-06 16:12           ` Jan Beulich
@ 2022-10-06 18:41             ` H.J. Lu
  2022-10-07 13:03               ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-06 18:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Thu, Oct 6, 2022 at 9:12 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.10.2022 17:28, H.J. Lu wrote:
> > On Wed, Oct 5, 2022 at 11:58 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 06.10.2022 08:15, Jan Beulich via Binutils wrote:
> >>> On 06.10.2022 01:52, H.J. Lu wrote:
> >>>> Does the new assembler work on Linux kernel which has "rep movsd"?
> >>>
> >>> No. And it shouldn't, as they should never have used MOVSD. The only valid
> >>> mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
> >>> continue to support MOVSD in AT&T mode, then this will - once again for
> >>> consistency - need extending to _all_ other D-suffixable insns the SDM
> >>> specifies. I can only repeat what I've said before: Consistency is a
> >>> requirement such that users can predict assembler behavior.
> >>
> >> Note how Clang's integrated assembler doesn't even support CMPSD as a
> >> string instruction - that's imo yet more odd behavior, and likely
> >> attributed _solely_ to the goal of wanting to work around code wrongly
> >> using such.
> >
> > I think we should avoid changing assembly sources if possible.  Should we keep
> > CMPSD/MOVSD without any operands? This won't cause any confusion.
>
> Since Clang doesn't support CMPSD, I'd be (hesitantly) okay with keeping
> just the single MOVSD template having no operands. I'm still be inclined
> to warn if it ends up being used, so that people can correct their code.
> If you can explain why you think CMPSD also needs retaining in a similar
> way, I might be talked into keeping the operand-less form there as well.
> But anything going beyond that would have me fall back to requiring
> consistency throughout the mnemonics a D suffix might be used with as
> per vendor documentation.

Warning is fine with me.  We should accept MOVSD and CMPSD without
operands.


-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/7] x86: re-work insn/suffix recognition
  2022-10-06 18:41             ` H.J. Lu
@ 2022-10-07 13:03               ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2022-10-07 13:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 06.10.2022 20:41, H.J. Lu wrote:
> On Thu, Oct 6, 2022 at 9:12 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 06.10.2022 17:28, H.J. Lu wrote:
>>> On Wed, Oct 5, 2022 at 11:58 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 06.10.2022 08:15, Jan Beulich via Binutils wrote:
>>>>> On 06.10.2022 01:52, H.J. Lu wrote:
>>>>>> Does the new assembler work on Linux kernel which has "rep movsd"?
>>>>>
>>>>> No. And it shouldn't, as they should never have used MOVSD. The only valid
>>>>> mnemonic (in AT&T syntax) is MOVSL. If you're meaning to suggest that we
>>>>> continue to support MOVSD in AT&T mode, then this will - once again for
>>>>> consistency - need extending to _all_ other D-suffixable insns the SDM
>>>>> specifies. I can only repeat what I've said before: Consistency is a
>>>>> requirement such that users can predict assembler behavior.
>>>>
>>>> Note how Clang's integrated assembler doesn't even support CMPSD as a
>>>> string instruction - that's imo yet more odd behavior, and likely
>>>> attributed _solely_ to the goal of wanting to work around code wrongly
>>>> using such.
>>>
>>> I think we should avoid changing assembly sources if possible.  Should we keep
>>> CMPSD/MOVSD without any operands? This won't cause any confusion.
>>
>> Since Clang doesn't support CMPSD, I'd be (hesitantly) okay with keeping
>> just the single MOVSD template having no operands. I'm still be inclined
>> to warn if it ends up being used, so that people can correct their code.
>> If you can explain why you think CMPSD also needs retaining in a similar
>> way, I might be talked into keeping the operand-less form there as well.
>> But anything going beyond that would have me fall back to requiring
>> consistency throughout the mnemonics a D suffix might be used with as
>> per vendor documentation.
> 
> Warning is fine with me.  We should accept MOVSD and CMPSD without
> operands.

As it turns out, still dropping the two templates and adding a little bit
of code to parse_insn() is easier, especially for the purpose of emitting
a warning (which otherwise would need new code elsewhere, in a perhaps
less logical place).

Before I submit v4 - are there any comments on any of the other patches
in this series?

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-05  7:24 ` [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
@ 2022-10-11 17:44   ` H.J. Lu
  2022-10-12  7:08     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-11 17:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> PR gas/29524
> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> to prefix parsing. Utilize i.error just like match_template() does.

Since movs{b,w,l,q} are string instructions, integer sign extensions
require a suffix to specify the destination size.  This is different from other
integer instructions.  Since only the new assembler allows the implicit suffix,
it won't be easy to use.  We should improve error messages, but allowing
new syntax doesn't help much.

> ---
> v3: Re-base over changes to earlier patches (incl use of Pass2).
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -236,6 +236,8 @@ enum i386_error
>      unsupported_with_intel_mnemonic,
>      unsupported_syntax,
>      unsupported,
> +    unsupported_on_arch,
> +    unsupported_64bit,
>      invalid_sib_address,
>      invalid_vsib_address,
>      invalid_vector_register_set,
> @@ -4849,6 +4851,15 @@ md_assemble (char *line)
>      {
>        if (pass1_mnem != NULL)
>         goto match_error;
> +      if (i.error != no_error)
> +       {
> +         gas_assert (current_templates != NULL);
> +         if (current_templates->start->opcode_modifier.pass2 && !i.suffix)
> +           goto no_match;
> +         /* No point in trying a 2nd pass - it'll only find the same suffix
> +            again.  */
> +         goto match_error;
> +       }
>        return;
>      }
>    if (current_templates->start->opcode_modifier.pass2)
> @@ -4948,12 +4959,21 @@ md_assemble (char *line)
>         {
>           line = copy;
>           copy = NULL;
> +  no_match:
>           pass1_err = i.error;
>           pass1_mnem = current_templates->start->name;
>           goto retry;
>         }
> -      free (copy);
> +
> +      /* If a non-/only-64bit template (group) was found in pass 1, and if
> +        _some_ template (group) was found in pass 2, squash pass 1's
> +        error.  */
> +      if (pass1_err == unsupported_64bit)
> +       pass1_mnem = NULL;
> +
>    match_error:
> +      free (copy);
> +
>        switch (pass1_mnem ? pass1_err : i.error)
>         {
>         default:
> @@ -4986,6 +5006,17 @@ md_assemble (char *line)
>           as_bad (_("unsupported instruction `%s'"),
>                   pass1_mnem ? pass1_mnem : current_templates->start->name);
>           return;
> +       case unsupported_on_arch:
> +         as_bad (_("`%s' is not supported on `%s%s'"),
> +                 pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                 cpu_arch_name ? cpu_arch_name : default_arch,
> +                 cpu_sub_arch_name ? cpu_sub_arch_name : "");
> +         return;
> +       case unsupported_64bit:
> +         as_bad (_("`%s' is %s supported in 64-bit mode"),
> +                 pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                 flag_code == CODE_64BIT ? _("not") : _("only"));
> +         return;
>         case invalid_sib_address:
>           err_msg = _("invalid SIB address");
>           break;
> @@ -5601,16 +5632,13 @@ parse_insn (const char *line, char *mnem
>         return l;
>      }
>
> -  if (!(supported & CPU_FLAGS_64BIT_MATCH))
> -    as_bad (flag_code == CODE_64BIT
> -           ? _("`%s' is not supported in 64-bit mode")
> -           : _("`%s' is only supported in 64-bit mode"),
> -           current_templates->start->name);
> -  else
> -    as_bad (_("`%s' is not supported on `%s%s'"),
> -           current_templates->start->name,
> -           cpu_arch_name ? cpu_arch_name : default_arch,
> -           cpu_sub_arch_name ? cpu_sub_arch_name : "");
> +  if (pass1)
> +    {
> +      if (supported & CPU_FLAGS_64BIT_MATCH)
> +        i.error = unsupported_on_arch;
> +      else
> +        i.error = unsupported_64bit;
> +    }
>
>    return NULL;
>  }
> --- a/gas/testsuite/gas/i386/movs.s
> +++ b/gas/testsuite/gas/i386/movs.s
> @@ -30,4 +30,10 @@ movs:
>  .ifdef x86_64
>         movswq  %ax,%rax
>         movswq  (%rax),%rax
> +
> +       movsl   %eax,%rax
> +       movsl   (%rax),%rax
> +
> +       movslq  %eax,%rax
> +       movslq  (%rax),%rax
>  .endif
> --- a/gas/testsuite/gas/i386/movx64.l
> +++ b/gas/testsuite/gas/i386/movx64.l
> @@ -241,6 +241,46 @@
>  [      ]*[1-9][0-9]*[  ]+movswq        %eax, %rcx
>  [      ]*[1-9][0-9]*[  ]+movswq        %rax, %rcx
>  [      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movsl %al, %cl
> +[      ]*[1-9][0-9]*[  ]+movsl %ax, %cl
> +[      ]*[1-9][0-9]*[  ]+movsl %eax, %cl
> +[      ]*[1-9][0-9]*[  ]+movsl %rax, %cl
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movsl %al, %cx
> +[      ]*[1-9][0-9]*[  ]+movsl %ax, %cx
> +[      ]*[1-9][0-9]*[  ]+movsl %eax, %cx
> +[      ]*[1-9][0-9]*[  ]+movsl %rax, %cx
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movsl %al, %ecx
> +[      ]*[1-9][0-9]*[  ]+movsl %ax, %ecx
> +[      ]*[1-9][0-9]*[  ]+movsl %eax, %ecx
> +[      ]*[1-9][0-9]*[  ]+movsl %rax, %ecx
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movsl %al, %rcx
> +[      ]*[1-9][0-9]*[  ]+movsl %ax, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 4863C8[  ]+movsl %eax, %rcx
> +[      ]*[1-9][0-9]*[  ]+movsl %rax, %rcx
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movslq        %al, %cl
> +[      ]*[1-9][0-9]*[  ]+movslq        %ax, %cl
> +[      ]*[1-9][0-9]*[  ]+movslq        %eax, %cl
> +[      ]*[1-9][0-9]*[  ]+movslq        %rax, %cl
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movslq        %al, %cx
> +[      ]*[1-9][0-9]*[  ]+movslq        %ax, %cx
> +[      ]*[1-9][0-9]*[  ]+movslq        %eax, %cx
> +[      ]*[1-9][0-9]*[  ]+movslq        %rax, %cx
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movslq        %al, %ecx
> +[      ]*[1-9][0-9]*[  ]+movslq        %ax, %ecx
> +[      ]*[1-9][0-9]*[  ]+movslq        %eax, %ecx
> +[      ]*[1-9][0-9]*[  ]+movslq        %rax, %ecx
> +[      ]*[1-9][0-9]*[  ]*
> +[      ]*[1-9][0-9]*[  ]+movslq        %al, %rcx
> +[      ]*[1-9][0-9]*[  ]+movslq        %ax, %rcx
> +[      ]*[1-9][0-9]* \?\?\?\? 4863C8[  ]+movslq        %eax, %rcx
> +[      ]*[1-9][0-9]*[  ]+movslq        %rax, %rcx
> +[      ]*[1-9][0-9]*[  ]*
>  [      ]*[1-9][0-9]*[  ]+movzx:
>  [      ]*[1-9][0-9]*[  ]+movzx %al, %cl
>  [      ]*[1-9][0-9]*[  ]+movzx %ax, %cl
> --- a/gas/testsuite/gas/i386/movx64.s
> +++ b/gas/testsuite/gas/i386/movx64.s
> @@ -241,6 +241,46 @@ movsx:
>         movswq  %eax, %rcx
>         movswq  %rax, %rcx
>
> +       movsl   %al, %cl
> +       movsl   %ax, %cl
> +       movsl   %eax, %cl
> +       movsl   %rax, %cl
> +
> +       movsl   %al, %cx
> +       movsl   %ax, %cx
> +       movsl   %eax, %cx
> +       movsl   %rax, %cx
> +
> +       movsl   %al, %ecx
> +       movsl   %ax, %ecx
> +       movsl   %eax, %ecx
> +       movsl   %rax, %ecx
> +
> +       movsl   %al, %rcx
> +       movsl   %ax, %rcx
> +       movsl   %eax, %rcx
> +       movsl   %rax, %rcx
> +
> +       movslq  %al, %cl
> +       movslq  %ax, %cl
> +       movslq  %eax, %cl
> +       movslq  %rax, %cl
> +
> +       movslq  %al, %cx
> +       movslq  %ax, %cx
> +       movslq  %eax, %cx
> +       movslq  %rax, %cx
> +
> +       movslq  %al, %ecx
> +       movslq  %ax, %ecx
> +       movslq  %eax, %ecx
> +       movslq  %rax, %ecx
> +
> +       movslq  %al, %rcx
> +       movslq  %ax, %rcx
> +       movslq  %eax, %rcx
> +       movslq  %rax, %rcx
> +
>  movzx:
>         movzx   %al, %cl
>         movzx   %ax, %cl
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -164,9 +164,7 @@ movbe, 0x0f38f0, None, CpuMovbe, D|Modrm
>  // Move with sign extend.
>  movsb, 0xfbe, None, Cpu386, Modrm|No_bSuf|No_sSuf|No_ldSuf|Pass2, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movsw, 0xfbf, None, Cpu386, Modrm|No_bSuf|No_wSuf|No_sSuf|No_ldSuf|Pass2, { Reg16|Unspecified|BaseIndex, Reg32|Reg64 }
> -// "movslq" must not be converted into "movsl" to avoid conflict with the
> -// "movsl" string move instruction.
> -movslq, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Size64, { Reg32|Dword|Unspecified|BaseIndex, Reg64 }
> +movsl, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_ldSuf|Pass2, { Reg32|Unspecified|BaseIndex, Reg64 }
>  movsx, 0xfbe, None, Cpu386, W|Modrm|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg8|Reg16|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  movsx, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
>  movsxd, 0x63, None, Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Reg32|Unspecified|BaseIndex, Reg32|Reg64 }
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case
  2022-10-05  7:24 ` [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
@ 2022-10-11 17:49   ` H.J. Lu
  0 siblings, 0 replies; 34+ messages in thread
From: H.J. Lu @ 2022-10-11 17:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Have its use, except where actually legitimate, result in the same "only
> supported in 64-bit mode" diagnostic as emitted for other 64-bit only
> insns. Also suppress deriving of the suffix in Intel mode except in the
> legitimate cases. This in exchange allows dropping the respective code
> from match_template().
>
> Oddly enough despite gcc's preference towards FILDQ and FIST{,T}Q we
> had no testcase whatsoever for these. Therefore such tests are being
> added. Note that the removed line in the x86-64-lfence-load testcase
> was redundant with the exact same one a few lines up.
> ---
> With gcc's preference towards FILDQ / FIST{,T}Q I wonder whether the
> disassembler wouldn't better emit a Q suffix instead of the LL one.

Since glibc uses fildll, I don't think the change is needed.

> ---
> v3: Re-base over changes to earlier patches.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -4826,7 +4826,7 @@ void
>  md_assemble (char *line)
>  {
>    unsigned int j;
> -  char mnemonic[MAX_MNEM_SIZE], mnem_suffix, *copy = NULL;
> +  char mnemonic[MAX_MNEM_SIZE], mnem_suffix = 0, *copy = NULL;
>    const char *end, *pass1_mnem = NULL;
>    enum i386_error pass1_err = 0;
>    const insn_template *t;
> @@ -4858,6 +4858,7 @@ md_assemble (char *line)
>             goto no_match;
>           /* No point in trying a 2nd pass - it'll only find the same suffix
>              again.  */
> +         mnem_suffix = i.suffix;
>           goto match_error;
>         }
>        return;
> @@ -5013,9 +5014,15 @@ md_assemble (char *line)
>                   cpu_sub_arch_name ? cpu_sub_arch_name : "");
>           return;
>         case unsupported_64bit:
> -         as_bad (_("`%s' is %s supported in 64-bit mode"),
> -                 pass1_mnem ? pass1_mnem : current_templates->start->name,
> -                 flag_code == CODE_64BIT ? _("not") : _("only"));
> +         if (ISLOWER (mnem_suffix))
> +           as_bad (_("`%s%c' is %s supported in 64-bit mode"),
> +                   pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                   mnem_suffix,
> +                   flag_code == CODE_64BIT ? _("not") : _("only"));
> +         else
> +           as_bad (_("`%s' is %s supported in 64-bit mode"),
> +                   pass1_mnem ? pass1_mnem : current_templates->start->name,
> +                   flag_code == CODE_64BIT ? _("not") : _("only"));
>           return;
>         case invalid_sib_address:
>           err_msg = _("invalid SIB address");
> @@ -5358,6 +5365,23 @@ md_assemble (char *line)
>      last_insn.kind = last_insn_other;
>  }
>
> +/* The Q suffix is generally valid only in 64-bit mode, with very few
> +   exceptions: fild, fistp, fisttp, and cmpxchg8b.  Note that for fild
> +   and fisttp only one of their two templates is matched below: That's
> +   sufficient since other relevant attributes are the same between both
> +   respective templates.  */
> +static INLINE bool q_suffix_allowed(const insn_template *t)
> +{
> +  return flag_code == CODE_64BIT
> +        || (t->opcode_modifier.opcodespace == SPACE_BASE
> +            && t->base_opcode == 0xdf
> +            && (t->extension_opcode & 1)) /* fild / fistp / fisttp */
> +        || (t->opcode_modifier.opcodespace == SPACE_0F
> +            && t->base_opcode == 0xc7
> +            && t->opcode_modifier.opcodeprefix == PREFIX_NONE
> +            && t->extension_opcode == 1) /* cmpxchg8b */;
> +}
> +
>  static const char *
>  parse_insn (const char *line, char *mnemonic)
>  {
> @@ -5628,6 +5652,10 @@ parse_insn (const char *line, char *mnem
>    for (t = current_templates->start; t < current_templates->end; ++t)
>      {
>        supported |= cpu_flags_match (t);
> +
> +      if (i.suffix == QWORD_MNEM_SUFFIX && !q_suffix_allowed (t))
> +       supported &= ~CPU_FLAGS_64BIT_MATCH;
> +
>        if (supported == CPU_FLAGS_PERFECT_MATCH)
>         return l;
>      }
> @@ -6663,20 +6691,12 @@ match_template (char mnem_suffix)
>        for (j = 0; j < MAX_OPERANDS; j++)
>         operand_types[j] = t->operand_types[j];
>
> -      /* In general, don't allow
> -        - 64-bit operands outside of 64-bit mode,
> -        - 32-bit operands on pre-386.  */
> +      /* In general, don't allow 32-bit operands on pre-386.  */
>        specific_error = progress (mnem_suffix ? invalid_instruction_suffix
>                                              : operand_size_mismatch);
>        j = i.imm_operands + (t->operands > i.imm_operands + 1);
> -      if (((i.suffix == QWORD_MNEM_SUFFIX
> -           && flag_code != CODE_64BIT
> -           && !(t->opcode_modifier.opcodespace == SPACE_0F
> -                && t->base_opcode == 0xc7
> -                && t->opcode_modifier.opcodeprefix == PREFIX_NONE
> -                && t->extension_opcode == 1) /* cmpxchg8b */)
> -          || (i.suffix == LONG_MNEM_SUFFIX
> -              && !cpu_arch_flags.bitfield.cpui386))
> +      if (i.suffix == LONG_MNEM_SUFFIX
> +         && !cpu_arch_flags.bitfield.cpui386
>           && (intel_syntax
>               ? (t->opcode_modifier.mnemonicsize != IGNORESIZE
>                  && !intel_float_operand (t->name))
> --- a/gas/config/tc-i386-intel.c
> +++ b/gas/config/tc-i386-intel.c
> @@ -824,7 +824,7 @@ i386_intel_operand (char *operand_string
>                     continue;
>                   break;
>                 case QWORD_MNEM_SUFFIX:
> -                 if (t->opcode_modifier.no_qsuf)
> +                 if (t->opcode_modifier.no_qsuf || !q_suffix_allowed (t))
>                     continue;
>                   break;
>                 case SHORT_MNEM_SUFFIX:
> --- a/gas/testsuite/gas/i386/opcode.d
> +++ b/gas/testsuite/gas/i386/opcode.d
> @@ -592,6 +592,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnp -0x6f6f6f70\(%eax\),%edx
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovp  -0x6f6f6f70\(%eax\),%dx
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnp -0x6f6f6f70\(%eax\),%dx
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
>   +[a-f0-9]+:   82 c3 01                add    \$0x1,%bl
>   +[a-f0-9]+:   82 f3 01                xor    \$0x1,%bl
>   +[a-f0-9]+:   82 d3 01                adc    \$0x1,%bl
> --- a/gas/testsuite/gas/i386/opcode.s
> +++ b/gas/testsuite/gas/i386/opcode.s
> @@ -592,6 +592,11 @@ foo:
>   cmovpe  0x90909090(%eax),%dx
>   cmovpo 0x90909090(%eax),%dx
>
> + fildq  (%eax)
> + fildll (%eax)
> + fistpq (%eax)
> + fistpll (%eax)
> +
>         .byte 0x82, 0xc3, 0x01
>         .byte 0x82, 0xf3, 0x01
>         .byte 0x82, 0xd3, 0x01
> --- a/gas/testsuite/gas/i386/opcode-intel.d
> +++ b/gas/testsuite/gas/i386/opcode-intel.d
> @@ -593,6 +593,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnp edx,DWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovp  dx,WORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnp dx,WORD PTR \[eax-0x6f6f6f70\]
> +[      ]*[a-f0-9]+:    df 28                   fild   QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 28                   fild   QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 38                   fistp  QWORD PTR \[eax\]
> +[      ]*[a-f0-9]+:    df 38                   fistp  QWORD PTR \[eax\]
>   +[a-f0-9]+:   82 c3 01                add    bl,0x1
>   +[a-f0-9]+:   82 f3 01                xor    bl,0x1
>   +[a-f0-9]+:   82 d3 01                adc    bl,0x1
> --- a/gas/testsuite/gas/i386/opcode-suffix.d
> +++ b/gas/testsuite/gas/i386/opcode-suffix.d
> @@ -593,6 +593,10 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    0f 4b 90 90 90 90 90    cmovnpl -0x6f6f6f70\(%eax\),%edx
>  [      ]*[a-f0-9]+:    66 0f 4a 90 90 90 90 90         cmovpw -0x6f6f6f70\(%eax\),%dx
>  [      ]*[a-f0-9]+:    66 0f 4b 90 90 90 90 90         cmovnpw -0x6f6f6f70\(%eax\),%dx
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 28                   fildll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
> +[      ]*[a-f0-9]+:    df 38                   fistpll \(%eax\)
>   +[a-f0-9]+:   82 c3 01                addb   \$0x1,%bl
>   +[a-f0-9]+:   82 f3 01                xorb   \$0x1,%bl
>   +[a-f0-9]+:   82 d3 01                adcb   \$0x1,%bl
> --- a/gas/testsuite/gas/i386/sse3.d
> +++ b/gas/testsuite/gas/i386/sse3.d
> @@ -13,29 +13,30 @@ Disassembly of section .text:
>    10:  df 88 90 90 90 90 [     ]*fisttps -0x6f6f6f70\(%eax\)
>    16:  db 88 90 90 90 90 [     ]*fisttpl -0x6f6f6f70\(%eax\)
>    1c:  dd 88 90 90 90 90 [     ]*fisttpll -0x6f6f6f70\(%eax\)
> -  22:  66 0f 7c 65 00 [        ]*haddpd 0x0\(%ebp\),%xmm4
> -  27:  66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
> -  2b:  f2 0f 7c 37 [   ]*haddps \(%edi\),%xmm6
> -  2f:  f2 0f 7c f8 [   ]*haddps %xmm0,%xmm7
> -  33:  66 0f 7d c1 [   ]*hsubpd %xmm1,%xmm0
> -  37:  66 0f 7d 0a [   ]*hsubpd \(%edx\),%xmm1
> -  3b:  f2 0f 7d d2 [   ]*hsubps %xmm2,%xmm2
> -  3f:  f2 0f 7d 1c 24 [        ]*hsubps \(%esp\),%xmm3
> -  44:  f2 0f f0 2e [   ]*lddqu  \(%esi\),%xmm5
> -  48:  0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> -  4b:  0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> -  4e:  f2 0f 12 f7 [   ]*movddup %xmm7,%xmm6
> -  52:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> -  56:  f3 0f 16 01 [   ]*movshdup \(%ecx\),%xmm0
> -  5a:  f3 0f 16 ca [   ]*movshdup %xmm2,%xmm1
> -  5e:  f3 0f 12 13 [   ]*movsldup \(%ebx\),%xmm2
> -  62:  f3 0f 12 dc [   ]*movsldup %xmm4,%xmm3
> -  66:  0f 01 c9 [      ]*mwait  %eax,%ecx
> -  69:  0f 01 c9 [      ]*mwait  %eax,%ecx
> -  6c:  67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> -  70:  67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> -  74:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> -  78:  f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    dd 88 90 90 90 90 [     ]*fisttpll -0x6f6f6f70\(%eax\)
> +[      ]*[0-9a-f]+:    66 0f 7c 65 00 [        ]*haddpd 0x0\(%ebp\),%xmm4
> +[      ]*[0-9a-f]+:    66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
> +[      ]*[0-9a-f]+:    f2 0f 7c 37 [   ]*haddps \(%edi\),%xmm6
> +[      ]*[0-9a-f]+:    f2 0f 7c f8 [   ]*haddps %xmm0,%xmm7
> +[      ]*[0-9a-f]+:    66 0f 7d c1 [   ]*hsubpd %xmm1,%xmm0
> +[      ]*[0-9a-f]+:    66 0f 7d 0a [   ]*hsubpd \(%edx\),%xmm1
> +[      ]*[0-9a-f]+:    f2 0f 7d d2 [   ]*hsubps %xmm2,%xmm2
> +[      ]*[0-9a-f]+:    f2 0f 7d 1c 24 [        ]*hsubps \(%esp\),%xmm3
> +[      ]*[0-9a-f]+:    f2 0f f0 2e [   ]*lddqu  \(%esi\),%xmm5
> +[      ]*[0-9a-f]+:    0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> +[      ]*[0-9a-f]+:    0f 01 c8 [      ]*monitor %eax,%ecx,%edx
> +[      ]*[0-9a-f]+:    f2 0f 12 f7 [   ]*movddup %xmm7,%xmm6
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    f3 0f 16 01 [   ]*movshdup \(%ecx\),%xmm0
> +[      ]*[0-9a-f]+:    f3 0f 16 ca [   ]*movshdup %xmm2,%xmm1
> +[      ]*[0-9a-f]+:    f3 0f 12 13 [   ]*movsldup \(%ebx\),%xmm2
> +[      ]*[0-9a-f]+:    f3 0f 12 dc [   ]*movsldup %xmm4,%xmm3
> +[      ]*[0-9a-f]+:    0f 01 c9 [      ]*mwait  %eax,%ecx
> +[      ]*[0-9a-f]+:    0f 01 c9 [      ]*mwait  %eax,%ecx
> +[      ]*[0-9a-f]+:    67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> +[      ]*[0-9a-f]+:    67 0f 01 c8 [   ]*monitor %ax,%ecx,%edx
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
> +[      ]*[0-9a-f]+:    f2 0f 12 38 [   ]*movddup \(%eax\),%xmm7
>  [      ]*[0-9a-f]+:    0f 01 c8[       ]+monitor %eax,%ecx,%edx
>  [      ]*[0-9a-f]+:    67 0f 01 c8[    ]+monitor %ax,%ecx,%edx
>  [      ]*[0-9a-f]+:    0f 01 c9[       ]+mwait  %eax,%ecx
> --- a/gas/testsuite/gas/i386/sse3.s
> +++ b/gas/testsuite/gas/i386/sse3.s
> @@ -8,6 +8,7 @@ foo:
>         addsubps        %xmm4,%xmm3
>         fisttps         0x90909090(%eax)
>         fisttpl         0x90909090(%eax)
> +       fisttpq         0x90909090(%eax)
>         fisttpll        0x90909090(%eax)
>         haddpd          0x0(%ebp),%xmm4
>         haddpd          %xmm6,%xmm5
> --- a/gas/testsuite/gas/i386/sse3-intel.d
> +++ b/gas/testsuite/gas/i386/sse3-intel.d
> @@ -14,6 +14,7 @@ Disassembly of section .text:
>  [      ]*[0-9a-f]+:    df 88 90 90 90 90[      ]+fisttp WORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    db 88 90 90 90 90[      ]+fisttp DWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    dd 88 90 90 90 90[      ]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
> +[      ]*[0-9a-f]+:    dd 88 90 90 90 90[      ]+fisttp QWORD PTR \[eax-0x6f6f6f70\]
>  [      ]*[0-9a-f]+:    66 0f 7c 65 00[         ]+haddpd xmm4,(XMMWORD PTR )?\[ebp(\+0x0)\]
>  [      ]*[0-9a-f]+:    66 0f 7c ee[    ]+haddpd xmm5,xmm6
>  [      ]*[0-9a-f]+:    f2 0f 7c 37[    ]+haddps xmm6,(XMMWORD PTR )?\[edi\]
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> @@ -44,16 +44,21 @@ Disassembly of section .text:
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   db 55 00                fistl  0x0\(%rbp\)
>   +[a-f0-9]+:   df 55 00                fists  0x0\(%rbp\)
> + +[a-f0-9]+:   db 5d 00                fistpl 0x0\(%rbp\)
> + +[a-f0-9]+:   df 5d 00                fistps 0x0\(%rbp\)
> + +[a-f0-9]+:   df 7d 00                fistpll 0x0\(%rbp\)
>   +[a-f0-9]+:   db 45 00                fildl  0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   df 45 00                filds  0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
> + +[a-f0-9]+:   df 6d 00                fildll 0x0\(%rbp\)
> + +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   9b dd 75 00             fsave  0x0\(%rbp\)
>   +[a-f0-9]+:   dd 65 00                frstor 0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
> - +[a-f0-9]+:   df 45 00                filds  0x0\(%rbp\)
> - +[a-f0-9]+:   0f ae e8                lfence
> + +[a-f0-9]+:   db 4d 00                fisttpl 0x0\(%rbp\)
>   +[a-f0-9]+:   df 4d 00                fisttps 0x0\(%rbp\)
> + +[a-f0-9]+:   dd 4d 00                fisttpll 0x0\(%rbp\)
>   +[a-f0-9]+:   d9 65 00                fldenv 0x0\(%rbp\)
>   +[a-f0-9]+:   0f ae e8                lfence
>   +[a-f0-9]+:   9b d9 75 00             fstenv 0x0\(%rbp\)
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
> @@ -27,12 +27,17 @@ _start:
>         flds (%rbp)
>         fistl (%rbp)
>         fists (%rbp)
> +       fistpl (%rbp)
> +       fistps (%rbp)
> +       fistpq (%rbp)
>         fildl (%rbp)
>         filds (%rbp)
> +       fildq (%rbp)
>         fsave (%rbp)
>         frstor (%rbp)
> -       filds (%rbp)
> +       fisttpl (%rbp)
>         fisttps (%rbp)
> +       fisttpq (%rbp)
>         fldenv (%rbp)
>         fstenv (%rbp)
>         fadds  (%rbp)
> --- a/gas/testsuite/gas/i386/x86-64-sse3.d
> +++ b/gas/testsuite/gas/i386/x86-64-sse3.d
> @@ -13,6 +13,7 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    df 88 90 90 90 00 [     ]*fisttps 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    db 88 90 90 90 00 [     ]*fisttpl 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    dd 88 90 90 90 00 [     ]*fisttpll 0x909090\(%rax\)
> +[      ]*[a-f0-9]+:    dd 88 90 90 90 00 [     ]*fisttpll 0x909090\(%rax\)
>  [      ]*[a-f0-9]+:    66 0f 7c 65 00 [        ]*haddpd 0x0\(%rbp\),%xmm4
>  [      ]*[a-f0-9]+:    66 0f 7c ee [   ]*haddpd %xmm6,%xmm5
>  [      ]*[a-f0-9]+:    f2 0f 7c 37 [   ]*haddps \(%rdi\),%xmm6
> --- a/gas/testsuite/gas/i386/x86-64-sse3.s
> +++ b/gas/testsuite/gas/i386/x86-64-sse3.s
> @@ -8,6 +8,7 @@ foo:
>         addsubps        %xmm4,%xmm3
>         fisttps         0x909090(%rax)
>         fisttpl         0x909090(%rax)
> +       fisttpq         0x909090(%rax)
>         fisttpll        0x909090(%rax)
>         haddpd          0x0(%rbp),%xmm4
>         haddpd          %xmm6,%xmm5
> --- a/gas/testsuite/gas/i386/x86-64-sse3-intel.d
> +++ b/gas/testsuite/gas/i386/x86-64-sse3-intel.d
> @@ -14,6 +14,7 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    df 88 90 90 90 00[      ]+fisttp WORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    db 88 90 90 90 00[      ]+fisttp DWORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    dd 88 90 90 90 00[      ]+fisttp QWORD PTR \[rax\+0x909090\]
> +[      ]*[a-f0-9]+:    dd 88 90 90 90 00[      ]+fisttp QWORD PTR \[rax\+0x909090\]
>  [      ]*[a-f0-9]+:    66 0f 7c 65 00[         ]+haddpd xmm4,(XMMWORD PTR )?\[rbp(\+0x0)\]
>  [      ]*[a-f0-9]+:    66 0f 7c ee[    ]+haddpd xmm5,xmm6
>  [      ]*[a-f0-9]+:    f2 0f 7c 37[    ]+haddps xmm6,(XMMWORD PTR )?\[rdi\]
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address
  2022-10-05  7:25 ` [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
@ 2022-10-11 17:50   ` H.J. Lu
  0 siblings, 0 replies; 34+ messages in thread
From: H.J. Lu @ 2022-10-11 17:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 12:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> In commit 1212781b35c9 ("ix86: allow HLE store of accumulator to
> absolute address") I was wrong to exclude 64-bit code. Dropping the
> check also leads to better diagnostics in 64-bit code ("MOV", after
> all, isn't invalid with "XRELEASE").
>
> While there also limit the amount of further checks done: The operand
> type checks that were there were effectively redundant with other ones
> anyway, plus it's quite fine to also have "xrelease mov <disp>, %eax"
> look for the next MOV template (in fact again also improving
> diagnostics).
> ---
> v2: New.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -6819,12 +6819,9 @@ match_template (char mnem_suffix)
>             continue;
>           /* xrelease mov %eax, <disp> is another special case. It must not
>              match the accumulator-only encoding of mov.  */
> -         if (flag_code != CODE_64BIT
> -             && i.hle_prefix
> +         if (i.hle_prefix
>               && t->base_opcode == 0xa0
> -             && t->opcode_modifier.opcodespace == SPACE_BASE
> -             && i.types[0].bitfield.instance == Accum
> -             && (i.flags[1] & Operand_Mem))
> +             && t->opcode_modifier.opcodespace == SPACE_BASE)
>             continue;
>           /* Fall through.  */
>
> --- a/gas/testsuite/gas/i386/x86-64-hle-intel.d
> +++ b/gas/testsuite/gas/i386/x86-64-hle-intel.d
> @@ -425,6 +425,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 20 01             lock xacquire and BYTE PTR \[rcx\],al
>  [      ]*[a-f0-9]+:    f0 f3 20 01             lock xrelease and BYTE PTR \[rcx\],al
>  [      ]*[a-f0-9]+:    f3 88 01                xrelease mov BYTE PTR \[rcx\],al
> +[      ]*[a-f0-9]+:    f3 88 04 25 78 56 34 12         xrelease mov BYTE PTR (ds:)?0x12345678,al
> +[      ]*[a-f0-9]+:    67 f3 88 04 25 21 43 65 87      xrelease mov BYTE PTR \[eiz\*1\+0x87654321\],al
>  [      ]*[a-f0-9]+:    f2 f0 08 01             xacquire lock or BYTE PTR \[rcx\],al
>  [      ]*[a-f0-9]+:    f2 f0 08 01             xacquire lock or BYTE PTR \[rcx\],al
>  [      ]*[a-f0-9]+:    f3 f0 08 01             xrelease lock or BYTE PTR \[rcx\],al
> @@ -476,6 +478,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 66 21 01          lock xacquire and WORD PTR \[rcx\],ax
>  [      ]*[a-f0-9]+:    f0 f3 66 21 01          lock xrelease and WORD PTR \[rcx\],ax
>  [      ]*[a-f0-9]+:    66 f3 89 01             xrelease mov WORD PTR \[rcx\],ax
> +[      ]*[a-f0-9]+:    66 f3 89 04 25 78 56 34 12      xrelease mov WORD PTR (ds:)?0x12345678,ax
> +[      ]*[a-f0-9]+:    67 66 f3 89 04 25 21 43 65 87   xrelease mov WORD PTR \[eiz\*1\+0x87654321\],ax
>  [      ]*[a-f0-9]+:    66 f2 f0 09 01          xacquire lock or WORD PTR \[rcx\],ax
>  [      ]*[a-f0-9]+:    66 f2 f0 09 01          xacquire lock or WORD PTR \[rcx\],ax
>  [      ]*[a-f0-9]+:    66 f3 f0 09 01          xrelease lock or WORD PTR \[rcx\],ax
> @@ -527,6 +531,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 21 01             lock xacquire and DWORD PTR \[rcx\],eax
>  [      ]*[a-f0-9]+:    f0 f3 21 01             lock xrelease and DWORD PTR \[rcx\],eax
>  [      ]*[a-f0-9]+:    f3 89 01                xrelease mov DWORD PTR \[rcx\],eax
> +[      ]*[a-f0-9]+:    f3 89 04 25 78 56 34 12         xrelease mov DWORD PTR (ds:)?0x12345678,eax
> +[      ]*[a-f0-9]+:    67 f3 89 04 25 21 43 65 87      xrelease mov DWORD PTR \[eiz\*1\+0x87654321\],eax
>  [      ]*[a-f0-9]+:    f2 f0 09 01             xacquire lock or DWORD PTR \[rcx\],eax
>  [      ]*[a-f0-9]+:    f2 f0 09 01             xacquire lock or DWORD PTR \[rcx\],eax
>  [      ]*[a-f0-9]+:    f3 f0 09 01             xrelease lock or DWORD PTR \[rcx\],eax
> @@ -578,6 +584,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 48 21 01          lock xacquire and QWORD PTR \[rcx\],rax
>  [      ]*[a-f0-9]+:    f0 f3 48 21 01          lock xrelease and QWORD PTR \[rcx\],rax
>  [      ]*[a-f0-9]+:    f3 48 89 01             xrelease mov QWORD PTR \[rcx\],rax
> +[      ]*[a-f0-9]+:    f3 48 89 04 25 78 56 34 12      xrelease mov QWORD PTR (ds:)?0x12345678,rax
> +[      ]*[a-f0-9]+:    67 f3 48 89 04 25 21 43 65 87   xrelease mov QWORD PTR \[eiz\*1\+0x87654321\],rax
>  [      ]*[a-f0-9]+:    f2 f0 48 09 01          xacquire lock or QWORD PTR \[rcx\],rax
>  [      ]*[a-f0-9]+:    f2 f0 48 09 01          xacquire lock or QWORD PTR \[rcx\],rax
>  [      ]*[a-f0-9]+:    f3 f0 48 09 01          xrelease lock or QWORD PTR \[rcx\],rax
> --- a/gas/testsuite/gas/i386/x86-64-hle.d
> +++ b/gas/testsuite/gas/i386/x86-64-hle.d
> @@ -424,6 +424,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 20 01             lock xacquire and %al,\(%rcx\)
>  [      ]*[a-f0-9]+:    f0 f3 20 01             lock xrelease and %al,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 88 01                xrelease mov %al,\(%rcx\)
> +[      ]*[a-f0-9]+:    f3 88 04 25 78 56 34 12         xrelease mov %al,0x12345678
> +[      ]*[a-f0-9]+:    67 f3 88 04 25 21 43 65 87      xrelease mov %al,0x87654321\(,%eiz,1\)
>  [      ]*[a-f0-9]+:    f2 f0 08 01             xacquire lock or %al,\(%rcx\)
>  [      ]*[a-f0-9]+:    f2 f0 08 01             xacquire lock or %al,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 f0 08 01             xrelease lock or %al,\(%rcx\)
> @@ -475,6 +477,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 66 21 01          lock xacquire and %ax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f0 f3 66 21 01          lock xrelease and %ax,\(%rcx\)
>  [      ]*[a-f0-9]+:    66 f3 89 01             xrelease mov %ax,\(%rcx\)
> +[      ]*[a-f0-9]+:    66 f3 89 04 25 78 56 34 12      xrelease mov %ax,0x12345678
> +[      ]*[a-f0-9]+:    67 66 f3 89 04 25 21 43 65 87   xrelease mov %ax,0x87654321\(,%eiz,1\)
>  [      ]*[a-f0-9]+:    66 f2 f0 09 01          xacquire lock or %ax,\(%rcx\)
>  [      ]*[a-f0-9]+:    66 f2 f0 09 01          xacquire lock or %ax,\(%rcx\)
>  [      ]*[a-f0-9]+:    66 f3 f0 09 01          xrelease lock or %ax,\(%rcx\)
> @@ -526,6 +530,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 21 01             lock xacquire and %eax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f0 f3 21 01             lock xrelease and %eax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 89 01                xrelease mov %eax,\(%rcx\)
> +[      ]*[a-f0-9]+:    f3 89 04 25 78 56 34 12         xrelease mov %eax,0x12345678
> +[      ]*[a-f0-9]+:    67 f3 89 04 25 21 43 65 87      xrelease mov %eax,0x87654321\(,%eiz,1\)
>  [      ]*[a-f0-9]+:    f2 f0 09 01             xacquire lock or %eax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f2 f0 09 01             xacquire lock or %eax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 f0 09 01             xrelease lock or %eax,\(%rcx\)
> @@ -577,6 +583,8 @@ Disassembly of section .text:
>  [      ]*[a-f0-9]+:    f0 f2 48 21 01          lock xacquire and %rax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f0 f3 48 21 01          lock xrelease and %rax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 48 89 01             xrelease mov %rax,\(%rcx\)
> +[      ]*[a-f0-9]+:    f3 48 89 04 25 78 56 34 12      xrelease mov %rax,0x12345678
> +[      ]*[a-f0-9]+:    67 f3 48 89 04 25 21 43 65 87   xrelease mov %rax,0x87654321\(,%eiz,1\)
>  [      ]*[a-f0-9]+:    f2 f0 48 09 01          xacquire lock or %rax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f2 f0 48 09 01          xacquire lock or %rax,\(%rcx\)
>  [      ]*[a-f0-9]+:    f3 f0 48 09 01          xrelease lock or %rax,\(%rcx\)
> --- a/gas/testsuite/gas/i386/x86-64-hle.s
> +++ b/gas/testsuite/gas/i386/x86-64-hle.s
> @@ -442,6 +442,8 @@ _start:
>         .byte 0xf0; .byte 0xf2; andb %al,(%rcx)
>         .byte 0xf0; .byte 0xf3; andb %al,(%rcx)
>         xrelease movb %al,(%rcx)
> +       xrelease movb %al,0x12345678
> +       xrelease addr32 movb %al,0x87654321
>         xacquire lock orb %al,(%rcx)
>         lock xacquire orb %al,(%rcx)
>         xrelease lock orb %al,(%rcx)
> @@ -496,6 +498,8 @@ _start:
>         .byte 0xf0; .byte 0xf2; andw %ax,(%rcx)
>         .byte 0xf0; .byte 0xf3; andw %ax,(%rcx)
>         xrelease movw %ax,(%rcx)
> +       xrelease movw %ax,0x12345678
> +       xrelease addr32 movw %ax,0x87654321
>         xacquire lock orw %ax,(%rcx)
>         lock xacquire orw %ax,(%rcx)
>         xrelease lock orw %ax,(%rcx)
> @@ -550,6 +554,8 @@ _start:
>         .byte 0xf0; .byte 0xf2; andl %eax,(%rcx)
>         .byte 0xf0; .byte 0xf3; andl %eax,(%rcx)
>         xrelease movl %eax,(%rcx)
> +       xrelease movl %eax,0x12345678
> +       xrelease addr32 movl %eax,0x87654321
>         xacquire lock orl %eax,(%rcx)
>         lock xacquire orl %eax,(%rcx)
>         xrelease lock orl %eax,(%rcx)
> @@ -604,6 +610,8 @@ _start:
>         .byte 0xf0; .byte 0xf2; andq %rax,(%rcx)
>         .byte 0xf0; .byte 0xf3; andq %rax,(%rcx)
>         xrelease movq %rax,(%rcx)
> +       xrelease movq %rax,0x12345678
> +       xrelease addr32 movq %rax,0x87654321
>         xacquire lock orq %rax,(%rcx)
>         lock xacquire orq %rax,(%rcx)
>         xrelease lock orq %rax,(%rcx)
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check
  2022-10-05  7:25 ` [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
@ 2022-10-11 17:57   ` H.J. Lu
  2022-10-12  7:13     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-11 17:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 5, 2022 at 12:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Having it in match_template() is unhelpful. Neither does looking for the
> next template to possibly match make any sense in that case, nor is the
> resulting diagnostic making clear what the problem is.
>
> While moving the check, also generalize it to include all SIMD and VEX-
> encoded insns. This way an existing conditional can be re-used in
> md_assemble(). Note though that this still leaves a lof of insns which
> are also wrong to use with these relocations.

These TLS instruction sequences are generated by compilers.
The check is only meant to check for invalid master registers.

> Further fold the remaining check (BFD_RELOC_386_GOT32) with the XRELEASE
> related one a few lines down. This again allows re-using an existing
> conditional.
> ---
> Is the set of relocations enumerated here actually complete?
> ---
> v2: New.
>
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -5119,14 +5119,30 @@ md_assemble (char *line)
>        return;
>      }
>
> -  /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
> -  if (i.prefix[DATA_PREFIX]
> -      && (is_any_vex_encoding (&i.tm)
> -         || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
> -         || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX))
> +  if (is_any_vex_encoding (&i.tm)
> +      || i.tm.operand_types[i.imm_operands].bitfield.class >= RegMMX
> +      || i.tm.operand_types[i.imm_operands + 1].bitfield.class >= RegMMX)
>      {
> -      as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
> -      return;
> +      /* Check for data size prefix on VEX/XOP/EVEX encoded and SIMD insns.  */
> +      if (i.prefix[DATA_PREFIX])
> +       {
> +         as_bad (_("data size prefix invalid with `%s'"), i.tm.name);
> +         return;
> +       }
> +
> +      /* Don't allow e.g. KMOV in TLS code sequences.  */
> +      for (j = i.imm_operands; j < i.operands; ++j)
> +       switch (i.reloc[j])
> +         {
> +         case BFD_RELOC_386_TLS_GOTIE:
> +         case BFD_RELOC_386_TLS_LE_32:
> +         case BFD_RELOC_X86_64_GOTTPOFF:
> +         case BFD_RELOC_X86_64_TLSLD:
> +           as_bad (_("TLS relocation cannot be used with `%s'"), i.tm.name);
> +           return;
> +         default:
> +           break;
> +         }
>      }
>
>    /* Check if HLE prefix is OK.  */
> @@ -6767,26 +6783,6 @@ match_template (char mnem_suffix)
>             }
>         }
>
> -      switch (i.reloc[0])
> -       {
> -       case BFD_RELOC_386_GOT32:
> -         /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
> -         if (t->base_opcode == 0xa0
> -             && t->opcode_modifier.opcodespace == SPACE_BASE)
> -           continue;
> -         break;
> -       case BFD_RELOC_386_TLS_GOTIE:
> -       case BFD_RELOC_386_TLS_LE_32:
> -       case BFD_RELOC_X86_64_GOTTPOFF:
> -       case BFD_RELOC_X86_64_TLSLD:
> -         /* Don't allow KMOV in TLS code sequences.  */
> -         if (t->opcode_modifier.vex)
> -           continue;
> -         break;
> -       default:
> -         break;
> -       }
> -
>        /* We check register size if needed.  */
>        if (t->opcode_modifier.checkregsize)
>         {
> @@ -6817,12 +6813,19 @@ match_template (char mnem_suffix)
>               && i.types[1].bitfield.instance == Accum
>               && i.types[1].bitfield.dword)
>             continue;
> -         /* xrelease mov %eax, <disp> is another special case. It must not
> -            match the accumulator-only encoding of mov.  */
> -         if (i.hle_prefix
> -             && t->base_opcode == 0xa0
> +
> +         if (t->base_opcode == MOV_AX_DISP32
>               && t->opcode_modifier.opcodespace == SPACE_BASE)
> -           continue;
> +           {
> +             /* Force 0x8b encoding for "mov foo@GOT, %eax".  */
> +             if (i.reloc[0] == BFD_RELOC_386_GOT32)
> +               continue;
> +
> +             /* xrelease mov %eax, <disp> is another special case. It must not
> +                match the accumulator-only encoding of mov.  */
> +             if (i.hle_prefix)
> +               continue;
> +           }
>           /* Fall through.  */
>
>         case 3:
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-11 17:44   ` H.J. Lu
@ 2022-10-12  7:08     ` Jan Beulich
  2022-10-12 17:10       ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-12  7:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 11.10.2022 19:44, H.J. Lu wrote:
> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> PR gas/29524
>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>> to prefix parsing. Utilize i.error just like match_template() does.
> 
> Since movs{b,w,l,q} are string instructions, integer sign extensions
> require a suffix to specify the destination size.  This is different from other
> integer instructions.  Since only the new assembler allows the implicit suffix,
> it won't be easy to use.  We should improve error messages, but allowing
> new syntax doesn't help much.

It is an earlier change making most of this consistent with MOVZ*; it is
only logical to extend this to the long-to-quad sign-extending insn. As
with any fixes to prior misbehavior - of course one needs to play by the
rules of the older assembler for a number of years. But projects raise
their baselines, and hence at some point projects with an "avoid suffixes
if possible" policy could switch.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check
  2022-10-11 17:57   ` H.J. Lu
@ 2022-10-12  7:13     ` Jan Beulich
  2022-10-12 17:02       ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-12  7:13 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 11.10.2022 19:57, H.J. Lu wrote:
> On Wed, Oct 5, 2022 at 12:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Having it in match_template() is unhelpful. Neither does looking for the
>> next template to possibly match make any sense in that case, nor is the
>> resulting diagnostic making clear what the problem is.
>>
>> While moving the check, also generalize it to include all SIMD and VEX-
>> encoded insns. This way an existing conditional can be re-used in
>> md_assemble(). Note though that this still leaves a lof of insns which
>> are also wrong to use with these relocations.
> 
> These TLS instruction sequences are generated by compilers.
> The check is only meant to check for invalid master registers.

The sequence involving a mask register was generated (aiui) by mistake.
Why would it not be possible to see another similar mistake involve an
XMM register, with a legacy, VEX-, or EVEX-encoded insn?

Anyway - thanks for approving the patch.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check
  2022-10-12  7:13     ` Jan Beulich
@ 2022-10-12 17:02       ` H.J. Lu
  0 siblings, 0 replies; 34+ messages in thread
From: H.J. Lu @ 2022-10-12 17:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 12, 2022 at 12:13 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 11.10.2022 19:57, H.J. Lu wrote:
> > On Wed, Oct 5, 2022 at 12:25 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> Having it in match_template() is unhelpful. Neither does looking for the
> >> next template to possibly match make any sense in that case, nor is the
> >> resulting diagnostic making clear what the problem is.
> >>
> >> While moving the check, also generalize it to include all SIMD and VEX-
> >> encoded insns. This way an existing conditional can be re-used in
> >> md_assemble(). Note though that this still leaves a lof of insns which
> >> are also wrong to use with these relocations.
> >
> > These TLS instruction sequences are generated by compilers.
> > The check is only meant to check for invalid master registers.
>
> The sequence involving a mask register was generated (aiui) by mistake.
> Why would it not be possible to see another similar mistake involve an
> XMM register, with a legacy, VEX-, or EVEX-encoded insn?

We haven't seen it yet.

> Anyway - thanks for approving the patch.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-12  7:08     ` Jan Beulich
@ 2022-10-12 17:10       ` H.J. Lu
  2022-10-13  6:08         ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-12 17:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 11.10.2022 19:44, H.J. Lu wrote:
> > On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> PR gas/29524
> >> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >> to prefix parsing. Utilize i.error just like match_template() does.
> >
> > Since movs{b,w,l,q} are string instructions, integer sign extensions
> > require a suffix to specify the destination size.  This is different from other
> > integer instructions.  Since only the new assembler allows the implicit suffix,
> > it won't be easy to use.  We should improve error messages, but allowing
> > new syntax doesn't help much.
>
> It is an earlier change making most of this consistent with MOVZ*; it is

MOVZ is different.  There are no MOVZ string instructions.  MOVS has
different meanings in ISA.   MOVS difference from MOVZ in assembly
syntax should be expected.

> only logical to extend this to the long-to-quad sign-extending insn. As
> with any fixes to prior misbehavior - of course one needs to play by the
> rules of the older assembler for a number of years. But projects raise
> their baselines, and hence at some point projects with an "avoid suffixes
> if possible" policy could switch.
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-12 17:10       ` H.J. Lu
@ 2022-10-13  6:08         ` Jan Beulich
  2022-10-13 17:00           ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-13  6:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 12.10.2022 19:10, H.J. Lu wrote:
> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 11.10.2022 19:44, H.J. Lu wrote:
>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> PR gas/29524
>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>
>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>> require a suffix to specify the destination size.  This is different from other
>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>> it won't be easy to use.  We should improve error messages, but allowing
>>> new syntax doesn't help much.
>>
>> It is an earlier change making most of this consistent with MOVZ*; it is
> 
> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> different meanings in ISA.   MOVS difference from MOVZ in assembly
> syntax should be expected.

You've said so before, yes, but I continue to disagree. And as we can see
from the series things can be made work consistently (and imo nothing else
should have been done right from the beginning).

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-13  6:08         ` Jan Beulich
@ 2022-10-13 17:00           ` H.J. Lu
  2022-10-14  7:03             ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-13 17:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 12.10.2022 19:10, H.J. Lu wrote:
> > On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 11.10.2022 19:44, H.J. Lu wrote:
> >>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> PR gas/29524
> >>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >>>> to prefix parsing. Utilize i.error just like match_template() does.
> >>>
> >>> Since movs{b,w,l,q} are string instructions, integer sign extensions
> >>> require a suffix to specify the destination size.  This is different from other
> >>> integer instructions.  Since only the new assembler allows the implicit suffix,
> >>> it won't be easy to use.  We should improve error messages, but allowing
> >>> new syntax doesn't help much.
> >>
> >> It is an earlier change making most of this consistent with MOVZ*; it is
> >
> > MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> > different meanings in ISA.   MOVS difference from MOVZ in assembly
> > syntax should be expected.
>
> You've said so before, yes, but I continue to disagree. And as we can see
> from the series things can be made work consistently (and imo nothing else
> should have been done right from the beginning).
>

There are inconsistencies in ISA.  AT&T syntax makes things more complex.
People should either deal with it or leave it to compilers.   I don't think we
should make assembler more complex.

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-13 17:00           ` H.J. Lu
@ 2022-10-14  7:03             ` Jan Beulich
  2022-10-14 17:07               ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-14  7:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 13.10.2022 19:00, H.J. Lu wrote:
> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 12.10.2022 19:10, H.J. Lu wrote:
>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 11.10.2022 19:44, H.J. Lu wrote:
>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> PR gas/29524
>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>>>
>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>>>> require a suffix to specify the destination size.  This is different from other
>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>>>> it won't be easy to use.  We should improve error messages, but allowing
>>>>> new syntax doesn't help much.
>>>>
>>>> It is an earlier change making most of this consistent with MOVZ*; it is
>>>
>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
>>> syntax should be expected.
>>
>> You've said so before, yes, but I continue to disagree. And as we can see
>> from the series things can be made work consistently (and imo nothing else
>> should have been done right from the beginning).
>>
> 
> There are inconsistencies in ISA.

Sure. But we shouldn't add further ones in the assembler.

>  AT&T syntax makes things more complex.
> People should either deal with it or leave it to compilers.   I don't think we
> should make assembler more complex.

The complexity added here isn't all that bad. You've added far more
complexity in the past for things which arguably shouldn't even be
dealt with by the assembler (I'm thinking of -malign-branch* and
-mlfence-* first of all).

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-14  7:03             ` Jan Beulich
@ 2022-10-14 17:07               ` H.J. Lu
  2022-10-17  7:02                 ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-14 17:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 13.10.2022 19:00, H.J. Lu wrote:
> > On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 12.10.2022 19:10, H.J. Lu wrote:
> >>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 11.10.2022 19:44, H.J. Lu wrote:
> >>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> PR gas/29524
> >>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >>>>>> to prefix parsing. Utilize i.error just like match_template() does.
> >>>>>
> >>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
> >>>>> require a suffix to specify the destination size.  This is different from other
> >>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
> >>>>> it won't be easy to use.  We should improve error messages, but allowing
> >>>>> new syntax doesn't help much.
> >>>>
> >>>> It is an earlier change making most of this consistent with MOVZ*; it is
> >>>
> >>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> >>> different meanings in ISA.   MOVS difference from MOVZ in assembly
> >>> syntax should be expected.
> >>
> >> You've said so before, yes, but I continue to disagree. And as we can see
> >> from the series things can be made work consistently (and imo nothing else
> >> should have been done right from the beginning).
> >>
> >
> > There are inconsistencies in ISA.
>
> Sure. But we shouldn't add further ones in the assembler.

Assembler just follows ISA.  Programmers should learn to
deal with it or use a compiler.

> >  AT&T syntax makes things more complex.
> > People should either deal with it or leave it to compilers.   I don't think we
> > should make assembler more complex.
>
> The complexity added here isn't all that bad. You've added far more
> complexity in the past for things which arguably shouldn't even be
> dealt with by the assembler (I'm thinking of -malign-branch* and
> -mlfence-* first of all).
>
> Jan



-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-14 17:07               ` H.J. Lu
@ 2022-10-17  7:02                 ` Jan Beulich
  2022-10-17 22:36                   ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-17  7:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 14.10.2022 19:07, H.J. Lu wrote:
> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 13.10.2022 19:00, H.J. Lu wrote:
>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 12.10.2022 19:10, H.J. Lu wrote:
>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> PR gas/29524
>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>>>>>
>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>>>>>> require a suffix to specify the destination size.  This is different from other
>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
>>>>>>> new syntax doesn't help much.
>>>>>>
>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
>>>>>
>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
>>>>> syntax should be expected.
>>>>
>>>> You've said so before, yes, but I continue to disagree. And as we can see
>>>> from the series things can be made work consistently (and imo nothing else
>>>> should have been done right from the beginning).
>>>>
>>>
>>> There are inconsistencies in ISA.
>>
>> Sure. But we shouldn't add further ones in the assembler.
> 
> Assembler just follows ISA.  Programmers should learn to
> deal with it or use a compiler.

This is entirely non-constructive. Assembler writers should get things into
usable (read: consistent) shape. Plus what ISA are you talking about here?
We're talking of mnemonics which aren't spelled out in any ISA document
anyway. The only halfway official AT&T doc I'm aware of doesn't provide
room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
and elsewhere (recently: CMPccXADD) you're even suggesting to force people
to omit suffixes (plus you've previously objected to the disassembler to
consistently emit them in suffix-always mode).

That same document was also only updated to cover 64-bit code in a half-
hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
list any form of MOVSXD at all afaics, for example). Where not explicitly
mentioned, their intended handling can only be inferred by using analogies.
Nor do we support some of the odd (quirky I would say) mnemonics that are
listed there, like xchglA or movabsbA (which is even wrongly described as
moving an immediate value into the register).

Bottom line: May I please ask that you take another (constructive) look at
the v4 submission?

Jan

[1] Really it does, by saying "long" is then implied, which has never been
the behavior of gas when a register saying otherwise is also in use.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-17  7:02                 ` Jan Beulich
@ 2022-10-17 22:36                   ` H.J. Lu
  2022-10-18  6:31                     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-17 22:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.10.2022 19:07, H.J. Lu wrote:
> > On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 13.10.2022 19:00, H.J. Lu wrote:
> >>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 12.10.2022 19:10, H.J. Lu wrote:
> >>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
> >>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> PR gas/29524
> >>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
> >>>>>>>
> >>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
> >>>>>>> require a suffix to specify the destination size.  This is different from other
> >>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
> >>>>>>> it won't be easy to use.  We should improve error messages, but allowing
> >>>>>>> new syntax doesn't help much.
> >>>>>>
> >>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
> >>>>>
> >>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> >>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
> >>>>> syntax should be expected.
> >>>>
> >>>> You've said so before, yes, but I continue to disagree. And as we can see
> >>>> from the series things can be made work consistently (and imo nothing else
> >>>> should have been done right from the beginning).
> >>>>
> >>>
> >>> There are inconsistencies in ISA.
> >>
> >> Sure. But we shouldn't add further ones in the assembler.
> >
> > Assembler just follows ISA.  Programmers should learn to
> > deal with it or use a compiler.
>
> This is entirely non-constructive. Assembler writers should get things into
> usable (read: consistent) shape. Plus what ISA are you talking about here?

GNU assembler has been this way for a long time and the current GNU
assembler will still be in use for the next few years.  Assembler writers
should know about all these.

> We're talking of mnemonics which aren't spelled out in any ISA document
> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
> to omit suffixes (plus you've previously objected to the disassembler to
> consistently emit them in suffix-always mode).
>
> That same document was also only updated to cover 64-bit code in a half-
> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
> list any form of MOVSXD at all afaics, for example). Where not explicitly
> mentioned, their intended handling can only be inferred by using analogies.
> Nor do we support some of the odd (quirky I would say) mnemonics that are
> listed there, like xchglA or movabsbA (which is even wrongly described as
> moving an immediate value into the register).
>
> Bottom line: May I please ask that you take another (constructive) look at
> the v4 submission?
>
> Jan
>
> [1] Really it does, by saying "long" is then implied, which has never been
> the behavior of gas when a register saying otherwise is also in use.



-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-17 22:36                   ` H.J. Lu
@ 2022-10-18  6:31                     ` Jan Beulich
  2022-10-18 21:48                       ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-18  6:31 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 18.10.2022 00:36, H.J. Lu wrote:
> On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 14.10.2022 19:07, H.J. Lu wrote:
>>> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 13.10.2022 19:00, H.J. Lu wrote:
>>>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 12.10.2022 19:10, H.J. Lu wrote:
>>>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
>>>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> PR gas/29524
>>>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>>>>>>>
>>>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>>>>>>>> require a suffix to specify the destination size.  This is different from other
>>>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
>>>>>>>>> new syntax doesn't help much.
>>>>>>>>
>>>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
>>>>>>>
>>>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
>>>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
>>>>>>> syntax should be expected.
>>>>>>
>>>>>> You've said so before, yes, but I continue to disagree. And as we can see
>>>>>> from the series things can be made work consistently (and imo nothing else
>>>>>> should have been done right from the beginning).
>>>>>>
>>>>>
>>>>> There are inconsistencies in ISA.
>>>>
>>>> Sure. But we shouldn't add further ones in the assembler.
>>>
>>> Assembler just follows ISA.  Programmers should learn to
>>> deal with it or use a compiler.
>>
>> This is entirely non-constructive. Assembler writers should get things into
>> usable (read: consistent) shape. Plus what ISA are you talking about here?
> 
> GNU assembler has been this way for a long time and the current GNU
> assembler will still be in use for the next few years.  Assembler writers
> should know about all these.

Hmm, so after all not any ISA to follow? Plus do you suggest there's
only people having written x86 assembler code for many years? And
there's no people who would prefer to get their code into more
consistent shape, but who are limited by assembler shortcomings?

In any event, ...

>> We're talking of mnemonics which aren't spelled out in any ISA document
>> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
>> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
>> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
>> to omit suffixes (plus you've previously objected to the disassembler to
>> consistently emit them in suffix-always mode).
>>
>> That same document was also only updated to cover 64-bit code in a half-
>> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
>> list any form of MOVSXD at all afaics, for example). Where not explicitly
>> mentioned, their intended handling can only be inferred by using analogies.
>> Nor do we support some of the odd (quirky I would say) mnemonics that are
>> listed there, like xchglA or movabsbA (which is even wrongly described as
>> moving an immediate value into the register).
>>
>> Bottom line: May I please ask that you take another (constructive) look at
>> the v4 submission?

... this request continues to stand.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-18  6:31                     ` Jan Beulich
@ 2022-10-18 21:48                       ` H.J. Lu
  2022-10-19  6:08                         ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-18 21:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Mon, Oct 17, 2022 at 11:31 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.10.2022 00:36, H.J. Lu wrote:
> > On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 14.10.2022 19:07, H.J. Lu wrote:
> >>> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 13.10.2022 19:00, H.J. Lu wrote:
> >>>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 12.10.2022 19:10, H.J. Lu wrote:
> >>>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
> >>>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> PR gas/29524
> >>>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >>>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >>>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
> >>>>>>>>>
> >>>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
> >>>>>>>>> require a suffix to specify the destination size.  This is different from other
> >>>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
> >>>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
> >>>>>>>>> new syntax doesn't help much.
> >>>>>>>>
> >>>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
> >>>>>>>
> >>>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> >>>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
> >>>>>>> syntax should be expected.
> >>>>>>
> >>>>>> You've said so before, yes, but I continue to disagree. And as we can see
> >>>>>> from the series things can be made work consistently (and imo nothing else
> >>>>>> should have been done right from the beginning).
> >>>>>>
> >>>>>
> >>>>> There are inconsistencies in ISA.
> >>>>
> >>>> Sure. But we shouldn't add further ones in the assembler.
> >>>
> >>> Assembler just follows ISA.  Programmers should learn to
> >>> deal with it or use a compiler.
> >>
> >> This is entirely non-constructive. Assembler writers should get things into
> >> usable (read: consistent) shape. Plus what ISA are you talking about here?
> >
> > GNU assembler has been this way for a long time and the current GNU
> > assembler will still be in use for the next few years.  Assembler writers
> > should know about all these.
>
> Hmm, so after all not any ISA to follow? Plus do you suggest there's
> only people having written x86 assembler code for many years? And
> there's no people who would prefer to get their code into more
> consistent shape, but who are limited by assembler shortcomings?

I prefer consistency with existing assemblers and ISA specs.
For new instructions, suffixes should be used only when there is
an ambiguity.  For existing instructions, to support existing assembly
codes, suffixes may be optional even when there is an ambiguity.
But for integer MOVS instructions, suffixes have always been required.
I don't think we should change them now.  We can improve documentation
if needed.

> In any event, ...
>
> >> We're talking of mnemonics which aren't spelled out in any ISA document
> >> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
> >> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
> >> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
> >> to omit suffixes (plus you've previously objected to the disassembler to
> >> consistently emit them in suffix-always mode).
> >>
> >> That same document was also only updated to cover 64-bit code in a half-
> >> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
> >> list any form of MOVSXD at all afaics, for example). Where not explicitly
> >> mentioned, their intended handling can only be inferred by using analogies.
> >> Nor do we support some of the odd (quirky I would say) mnemonics that are
> >> listed there, like xchglA or movabsbA (which is even wrongly described as
> >> moving an immediate value into the register).
> >>
> >> Bottom line: May I please ask that you take another (constructive) look at
> >> the v4 submission?
>
> ... this request continues to stand.

I think we should improve diagnostics and documentation, not add new
syntaxes to existing instructions.
-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-18 21:48                       ` H.J. Lu
@ 2022-10-19  6:08                         ` Jan Beulich
  2022-10-19 21:46                           ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2022-10-19  6:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 18.10.2022 23:48, H.J. Lu wrote:
> On Mon, Oct 17, 2022 at 11:31 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 18.10.2022 00:36, H.J. Lu wrote:
>>> On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 14.10.2022 19:07, H.J. Lu wrote:
>>>>> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 13.10.2022 19:00, H.J. Lu wrote:
>>>>>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 12.10.2022 19:10, H.J. Lu wrote:
>>>>>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
>>>>>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> PR gas/29524
>>>>>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>>>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>>>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>>>>>>>>>
>>>>>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>>>>>>>>>> require a suffix to specify the destination size.  This is different from other
>>>>>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>>>>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
>>>>>>>>>>> new syntax doesn't help much.
>>>>>>>>>>
>>>>>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
>>>>>>>>>
>>>>>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
>>>>>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
>>>>>>>>> syntax should be expected.
>>>>>>>>
>>>>>>>> You've said so before, yes, but I continue to disagree. And as we can see
>>>>>>>> from the series things can be made work consistently (and imo nothing else
>>>>>>>> should have been done right from the beginning).
>>>>>>>>
>>>>>>>
>>>>>>> There are inconsistencies in ISA.
>>>>>>
>>>>>> Sure. But we shouldn't add further ones in the assembler.
>>>>>
>>>>> Assembler just follows ISA.  Programmers should learn to
>>>>> deal with it or use a compiler.
>>>>
>>>> This is entirely non-constructive. Assembler writers should get things into
>>>> usable (read: consistent) shape. Plus what ISA are you talking about here?
>>>
>>> GNU assembler has been this way for a long time and the current GNU
>>> assembler will still be in use for the next few years.  Assembler writers
>>> should know about all these.
>>
>> Hmm, so after all not any ISA to follow? Plus do you suggest there's
>> only people having written x86 assembler code for many years? And
>> there's no people who would prefer to get their code into more
>> consistent shape, but who are limited by assembler shortcomings?
> 
> I prefer consistency with existing assemblers and ISA specs.
> For new instructions, suffixes should be used only when there is
> an ambiguity.  For existing instructions, to support existing assembly
> codes, suffixes may be optional even when there is an ambiguity.
> But for integer MOVS instructions, suffixes have always been required.
> I don't think we should change them now.  We can improve documentation
> if needed.
> 
>> In any event, ...
>>
>>>> We're talking of mnemonics which aren't spelled out in any ISA document
>>>> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
>>>> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
>>>> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
>>>> to omit suffixes (plus you've previously objected to the disassembler to
>>>> consistently emit them in suffix-always mode).
>>>>
>>>> That same document was also only updated to cover 64-bit code in a half-
>>>> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
>>>> list any form of MOVSXD at all afaics, for example). Where not explicitly
>>>> mentioned, their intended handling can only be inferred by using analogies.
>>>> Nor do we support some of the odd (quirky I would say) mnemonics that are
>>>> listed there, like xchglA or movabsbA (which is even wrongly described as
>>>> moving an immediate value into the register).
>>>>
>>>> Bottom line: May I please ask that you take another (constructive) look at
>>>> the v4 submission?
>>
>> ... this request continues to stand.
> 
> I think we should improve diagnostics and documentation, not add new
> syntaxes to existing instructions.

I continue to disagree, but to make at least some forward progress: Which
parts of this series can I expect an okay for (patches 5-7 already have
one, but can't go in ahead)? I would try to re-order the series then to
put the controversial patch(es) at the end, such that at least parts can
go in and I can make further progress with other work. But there's no
promise this re-ordering actually is going to work out if it's more than
just this one patch which you continue to object to.

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-19  6:08                         ` Jan Beulich
@ 2022-10-19 21:46                           ` H.J. Lu
  2022-10-20 10:12                             ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2022-10-19 21:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils

On Tue, Oct 18, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.10.2022 23:48, H.J. Lu wrote:
> > On Mon, Oct 17, 2022 at 11:31 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 18.10.2022 00:36, H.J. Lu wrote:
> >>> On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 14.10.2022 19:07, H.J. Lu wrote:
> >>>>> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 13.10.2022 19:00, H.J. Lu wrote:
> >>>>>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>
> >>>>>>>> On 12.10.2022 19:10, H.J. Lu wrote:
> >>>>>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
> >>>>>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> PR gas/29524
> >>>>>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
> >>>>>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
> >>>>>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
> >>>>>>>>>>>
> >>>>>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
> >>>>>>>>>>> require a suffix to specify the destination size.  This is different from other
> >>>>>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
> >>>>>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
> >>>>>>>>>>> new syntax doesn't help much.
> >>>>>>>>>>
> >>>>>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
> >>>>>>>>>
> >>>>>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
> >>>>>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
> >>>>>>>>> syntax should be expected.
> >>>>>>>>
> >>>>>>>> You've said so before, yes, but I continue to disagree. And as we can see
> >>>>>>>> from the series things can be made work consistently (and imo nothing else
> >>>>>>>> should have been done right from the beginning).
> >>>>>>>>
> >>>>>>>
> >>>>>>> There are inconsistencies in ISA.
> >>>>>>
> >>>>>> Sure. But we shouldn't add further ones in the assembler.
> >>>>>
> >>>>> Assembler just follows ISA.  Programmers should learn to
> >>>>> deal with it or use a compiler.
> >>>>
> >>>> This is entirely non-constructive. Assembler writers should get things into
> >>>> usable (read: consistent) shape. Plus what ISA are you talking about here?
> >>>
> >>> GNU assembler has been this way for a long time and the current GNU
> >>> assembler will still be in use for the next few years.  Assembler writers
> >>> should know about all these.
> >>
> >> Hmm, so after all not any ISA to follow? Plus do you suggest there's
> >> only people having written x86 assembler code for many years? And
> >> there's no people who would prefer to get their code into more
> >> consistent shape, but who are limited by assembler shortcomings?
> >
> > I prefer consistency with existing assemblers and ISA specs.
> > For new instructions, suffixes should be used only when there is
> > an ambiguity.  For existing instructions, to support existing assembly
> > codes, suffixes may be optional even when there is an ambiguity.
> > But for integer MOVS instructions, suffixes have always been required.
> > I don't think we should change them now.  We can improve documentation
> > if needed.
> >
> >> In any event, ...
> >>
> >>>> We're talking of mnemonics which aren't spelled out in any ISA document
> >>>> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
> >>>> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
> >>>> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
> >>>> to omit suffixes (plus you've previously objected to the disassembler to
> >>>> consistently emit them in suffix-always mode).
> >>>>
> >>>> That same document was also only updated to cover 64-bit code in a half-
> >>>> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
> >>>> list any form of MOVSXD at all afaics, for example). Where not explicitly
> >>>> mentioned, their intended handling can only be inferred by using analogies.
> >>>> Nor do we support some of the odd (quirky I would say) mnemonics that are
> >>>> listed there, like xchglA or movabsbA (which is even wrongly described as
> >>>> moving an immediate value into the register).
> >>>>
> >>>> Bottom line: May I please ask that you take another (constructive) look at
> >>>> the v4 submission?
> >>
> >> ... this request continues to stand.
> >
> > I think we should improve diagnostics and documentation, not add new
> > syntaxes to existing instructions.
>
> I continue to disagree, but to make at least some forward progress: Which
> parts of this series can I expect an okay for (patches 5-7 already have
> one, but can't go in ahead)? I would try to re-order the series then to
> put the controversial patch(es) at the end, such that at least parts can
> go in and I can make further progress with other work. But there's no
> promise this re-ordering actually is going to work out if it's more than
> just this one patch which you continue to object to.
>

Please submit a new patch set without MOVS changes.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL
  2022-10-19 21:46                           ` H.J. Lu
@ 2022-10-20 10:12                             ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2022-10-20 10:12 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 19.10.2022 23:46, H.J. Lu wrote:
> On Tue, Oct 18, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 18.10.2022 23:48, H.J. Lu wrote:
>>> On Mon, Oct 17, 2022 at 11:31 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 18.10.2022 00:36, H.J. Lu wrote:
>>>>> On Mon, Oct 17, 2022 at 12:02 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 14.10.2022 19:07, H.J. Lu wrote:
>>>>>>> On Fri, Oct 14, 2022 at 12:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 13.10.2022 19:00, H.J. Lu wrote:
>>>>>>>>> On Wed, Oct 12, 2022 at 11:08 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 12.10.2022 19:10, H.J. Lu wrote:
>>>>>>>>>>> On Wed, Oct 12, 2022 at 12:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 11.10.2022 19:44, H.J. Lu wrote:
>>>>>>>>>>>>> On Wed, Oct 5, 2022 at 12:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PR gas/29524
>>>>>>>>>>>>>> In order to make MOVSL{,Q} behave similarly to MOVSB{W,L,Q} and
>>>>>>>>>>>>>> MOVSW{L,Q} we need to defer parse_insn()'s emitting of errors unrelated
>>>>>>>>>>>>>> to prefix parsing. Utilize i.error just like match_template() does.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since movs{b,w,l,q} are string instructions, integer sign extensions
>>>>>>>>>>>>> require a suffix to specify the destination size.  This is different from other
>>>>>>>>>>>>> integer instructions.  Since only the new assembler allows the implicit suffix,
>>>>>>>>>>>>> it won't be easy to use.  We should improve error messages, but allowing
>>>>>>>>>>>>> new syntax doesn't help much.
>>>>>>>>>>>>
>>>>>>>>>>>> It is an earlier change making most of this consistent with MOVZ*; it is
>>>>>>>>>>>
>>>>>>>>>>> MOVZ is different.  There are no MOVZ string instructions.  MOVS has
>>>>>>>>>>> different meanings in ISA.   MOVS difference from MOVZ in assembly
>>>>>>>>>>> syntax should be expected.
>>>>>>>>>>
>>>>>>>>>> You've said so before, yes, but I continue to disagree. And as we can see
>>>>>>>>>> from the series things can be made work consistently (and imo nothing else
>>>>>>>>>> should have been done right from the beginning).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There are inconsistencies in ISA.
>>>>>>>>
>>>>>>>> Sure. But we shouldn't add further ones in the assembler.
>>>>>>>
>>>>>>> Assembler just follows ISA.  Programmers should learn to
>>>>>>> deal with it or use a compiler.
>>>>>>
>>>>>> This is entirely non-constructive. Assembler writers should get things into
>>>>>> usable (read: consistent) shape. Plus what ISA are you talking about here?
>>>>>
>>>>> GNU assembler has been this way for a long time and the current GNU
>>>>> assembler will still be in use for the next few years.  Assembler writers
>>>>> should know about all these.
>>>>
>>>> Hmm, so after all not any ISA to follow? Plus do you suggest there's
>>>> only people having written x86 assembler code for many years? And
>>>> there's no people who would prefer to get their code into more
>>>> consistent shape, but who are limited by assembler shortcomings?
>>>
>>> I prefer consistency with existing assemblers and ISA specs.
>>> For new instructions, suffixes should be used only when there is
>>> an ambiguity.  For existing instructions, to support existing assembly
>>> codes, suffixes may be optional even when there is an ambiguity.
>>> But for integer MOVS instructions, suffixes have always been required.
>>> I don't think we should change them now.  We can improve documentation
>>> if needed.
>>>
>>>> In any event, ...
>>>>
>>>>>> We're talking of mnemonics which aren't spelled out in any ISA document
>>>>>> anyway. The only halfway official AT&T doc I'm aware of doesn't provide
>>>>>> room for omitting size suffixes [1]. Yet that's a fundamental feature of gas,
>>>>>> and elsewhere (recently: CMPccXADD) you're even suggesting to force people
>>>>>> to omit suffixes (plus you've previously objected to the disassembler to
>>>>>> consistently emit them in suffix-always mode).
>>>>>>
>>>>>> That same document was also only updated to cover 64-bit code in a half-
>>>>>> hearted way, so can't necessarily be used for 64-bit only insns (it doesn't
>>>>>> list any form of MOVSXD at all afaics, for example). Where not explicitly
>>>>>> mentioned, their intended handling can only be inferred by using analogies.
>>>>>> Nor do we support some of the odd (quirky I would say) mnemonics that are
>>>>>> listed there, like xchglA or movabsbA (which is even wrongly described as
>>>>>> moving an immediate value into the register).
>>>>>>
>>>>>> Bottom line: May I please ask that you take another (constructive) look at
>>>>>> the v4 submission?
>>>>
>>>> ... this request continues to stand.
>>>
>>> I think we should improve diagnostics and documentation, not add new
>>> syntaxes to existing instructions.
>>
>> I continue to disagree, but to make at least some forward progress: Which
>> parts of this series can I expect an okay for (patches 5-7 already have
>> one, but can't go in ahead)? I would try to re-order the series then to
>> put the controversial patch(es) at the end, such that at least parts can
>> go in and I can make further progress with other work. But there's no
>> promise this re-ordering actually is going to work out if it's more than
>> just this one patch which you continue to object to.
>>
> 
> Please submit a new patch set without MOVS changes.

As said - I'll put those in a separate patch at the end of the series. Not
the least because you'll be a little disappointed: The changes to tc-i386.c
simply need to move to another patch then. Hence there's not going to be
much left to make move-with-sign-extend consistent in that final patch
(most of it is then testsuite adjustments/additions).

Jan

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-10-20 10:12 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05  7:19 [PATCH v3 0/7] x86: suffix handling changes Jan Beulich
2022-10-05  7:20 ` [PATCH v3 1/7] x86: constify parse_insn()'s input Jan Beulich
2022-10-05  7:22 ` [PATCH v3 2/7] x86: introduce Pass2 insn attribute Jan Beulich
2022-10-05  7:23 ` [PATCH v3 3/7] x86: re-work insn/suffix recognition Jan Beulich
2022-10-05 23:52   ` H.J. Lu
2022-10-06  6:15     ` Jan Beulich
2022-10-06  6:58       ` Jan Beulich
2022-10-06 15:28         ` H.J. Lu
2022-10-06 16:12           ` Jan Beulich
2022-10-06 18:41             ` H.J. Lu
2022-10-07 13:03               ` Jan Beulich
2022-10-05  7:24 ` [PATCH v3 4/7] x86-64: further re-work insn/suffix recognition to also cover MOVSL Jan Beulich
2022-10-11 17:44   ` H.J. Lu
2022-10-12  7:08     ` Jan Beulich
2022-10-12 17:10       ` H.J. Lu
2022-10-13  6:08         ` Jan Beulich
2022-10-13 17:00           ` H.J. Lu
2022-10-14  7:03             ` Jan Beulich
2022-10-14 17:07               ` H.J. Lu
2022-10-17  7:02                 ` Jan Beulich
2022-10-17 22:36                   ` H.J. Lu
2022-10-18  6:31                     ` Jan Beulich
2022-10-18 21:48                       ` H.J. Lu
2022-10-19  6:08                         ` Jan Beulich
2022-10-19 21:46                           ` H.J. Lu
2022-10-20 10:12                             ` Jan Beulich
2022-10-05  7:24 ` [PATCH v3 5/7] ix86: don't recognize/derive Q suffix in the common case Jan Beulich
2022-10-11 17:49   ` H.J. Lu
2022-10-05  7:25 ` [PATCH v3 6/7] x86-64: allow HLE store of accumulator to absolute 32-bit address Jan Beulich
2022-10-11 17:50   ` H.J. Lu
2022-10-05  7:25 ` [PATCH v3 7/7] x86: move bad-use-of-TLS-reloc check Jan Beulich
2022-10-11 17:57   ` H.J. Lu
2022-10-12  7:13     ` Jan Beulich
2022-10-12 17:02       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).