public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler
@ 2022-06-03 12:05 Tsukasa OI
  2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-06-03 12:05 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt; +Cc: binutils

Hello,

I noticed another disassembler issue and this is a fix.


As I submitted in my previous patchset:
<https://sourceware.org/pipermail/binutils/2022-June/121159.html>,
there is a problem while disassembling zext.h instruction on
RV{32,64}_Zbb_Zbkb configuration.

On RV32, zext.h on Zbb is a specialized form of pack (Zbkb).
On RV64, zext.h on Zbb is a specialized form of packw (Zbkb).

Note that, if Zbkb extension is disabled, zext.h is a single
instruction with no "generalized" forms.


When **both** Zbb and Zbkb extensions are enabled and a zext.h
instruction is disassembled with `-M no-aliases', it should print
a non-alias instruction (that is, either pack or packw).

    # Input source file
    _start:
        zext.h a0, a1   # Specialized form on Zbb

    # Expected (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	pack	a0,a1,zero

However, it prints an alias.

    # Actual (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	zext.h	a0,a1

Note that, depending on -march, an "alias" is no longer an alias
(as I noted earlier).

    # Expected/Actual (RV32_Zbb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	zext.h	a0,a1


In general, this kind of issue occurs when:

1.  There are two instructions
    (one of them is a specialized form of another).
2.  Those requirements are different (separate) but can co-exist.

Because of 2., both instructions cannot be simple aliases (INSN_ALIAS
cannot be used).  However on non-alias instructions, if a match is
found, riscv_disassemble_insn thinks this is it and quits too early.

Because zext.h is declared before pack and packw, generalized forms are
not matched for zext.h.




As a solution, I propose a new concept: generic subsets.

For instructions with INSN_GENERICS, opcode matching cannot early-quit.
Instead it searches an instruction with:

-   Longest mask (by default; when -M no-aliases is NOT specified)
-   Shortest mask (when -M no-aliases is specified)

"Length" of the mask equals population count.  More one bits on the mask
means more specialized instruction form.

It fixes disassembler on following instructions and configurations:

-   zext.h  (Zbb)       <--> pack       (Zbkb; RV32)
-   zext.h  (Zbb)       <--> packw      (Zbkb; RV64)

Note that INSN_GENERICS (new flag in PATCH 2) must be set **both** on
specialized and generic forms.  In the example above, those instructions
require INSN_GENERICS:

-   zext.h (two XLEN-specific forms)
-   pack
-   packw

This concept can be used to following instruction pairs where the same
issue can occur once non-ratified instruction is ratified.

-   orc.b   (Zbb)       <--> gorci      (NOT RATIFIED)
-   brev8   (Zbkb)      <--> grevi      (NOT RATIFIED)
-   zip     (Zbkb)      <--> shfli      (NOT RATIFIED)
-   unzip   (Zbkb)      <--> unshfli    (NOT RATIFIED)
-   rev8    (Zbb/Zbkb)  <--> grevi      (NOT RATIFIED)




PATCH 1: Reorganize riscv_disassemble_insn function

riscv_disassemble_insn function (before PATCH 1) works like this:

    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        //
        // Print insn with matched opcode.
        //
        return insnlen;
    }
    //
    // opcode not matched.
    //

Because instruction printing code is in the opcode loop, this is not
good to fix the problem.  This patch reorganizes the function like this:

    matched_opcode = NULL;
    // Step 1: opcode matching
    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        matched_opcode = opcode;
        break;
    }
    // Step 2: opcode printing
    if (matched_opcode != NULL)
    {
        //
        // Print insn with matched_opcode.
        //
        return insnlen;
    }
    //
    // opcode not matched.
    //

PATCH 1 splits opcode loop to two separate steps: opcode matching and
printing.  With this, we can modify opcode matching step to implement
"generic subsets".




PATCH 2: Implement "generic subsets"

With this commit, riscv_disassemble_insn function works like this:

    int masklen;
    matched_opcode = NULL;
    // Step 1: opcode matching
    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        if (!(opcode.pinfo & INSN_GENERICS))
        {
            matched_opcode = opcode;
            break;  // Early-quit as before (for non generic subsets)
        }
        // Handling of generic subsets
        if (matched_opcode == NULL)
        {
            matched_opcode = opcode;
            // Set corresponding masklen.
        }
        else if (no_aliases)
        {
            // Prefer opcode with shortest mask.
            // Update match_opcode and masklen where necessary.
        }
        else
        {
            // Prefer opcode with longest mask.
            // Update match_opcode and masklen where necessary.
        }
    }
    // Step 2: opcode printing (continues...)

PATCH 2 also adds INSN_GENERICS flag to two zext.h forms and pack/packw
instructions.




PATCH 3 contains additional testcases.




Thanks,
Tsukasa




Tsukasa OI (3):
  RISC-V: Split disasm opcode matching and printing
  RISC-V: Implement "generic subsets" for disasm
  RISC-V: Add testcases disassembling zext.h

 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s   |  2 +
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d   | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s   |  4 +
 include/opcode/riscv.h                       |  4 +
 opcodes/riscv-dis.c                          | 93 ++++++++++++--------
 opcodes/riscv-opc.c                          |  8 +-
 9 files changed, 114 insertions(+), 41 deletions(-)
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s


base-commit: 625b6eae091709b95471eae92d42dc6bc71e6553
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] RISC-V: Split disasm opcode matching and printing
  2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
@ 2022-06-03 12:05 ` Tsukasa OI
  2022-06-03 12:05 ` [PATCH 2/3] RISC-V: Implement "generic subsets" for disasm Tsukasa OI
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-06-03 12:05 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt; +Cc: binutils

This commit splits instruction handling on riscv_disassemble_insn
function to two separate steps (opcode matching and printing).
This is a preparation for much complex opcode matching.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_disassemble_insn): Split instruction
	handling to two separate steps: opcode matching and printing.
---
 opcodes/riscv-dis.c | 85 ++++++++++++++++++++++++---------------------
 1 file changed, 46 insertions(+), 39 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 9ff31167775..48858e61c38 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -572,7 +572,7 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 static int
 riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 {
-  const struct riscv_opcode *op;
+  const struct riscv_opcode *op, *matched_op;
   static bool init = 0;
   static const struct riscv_opcode *riscv_hash[OP_MASK_OP + 1];
   struct riscv_private_data *pd;
@@ -624,6 +624,7 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
   info->target2 = 0;
 
   op = riscv_hash[OP_HASH_IDX (word)];
+  matched_op = NULL;
   if (op != NULL)
     {
       /* If XLEN is not known, get its value from the ELF class.  */
@@ -656,49 +657,55 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 	  if (!riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class))
 	    continue;
 
-	  /* It's a match.  */
-	  (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
-					"%s", op->name);
-	  print_insn_args (op->args, word, memaddr, info);
+	  matched_op = op;
+	  break;
+	}
+    }
 
-	  /* Try to disassemble multi-instruction addressing sequences.  */
-	  if (pd->print_addr != (bfd_vma)-1)
-	    {
-	      info->target = pd->print_addr;
-	      (*info->fprintf_styled_func)
-		(info->stream, dis_style_comment_start, " # ");
-	      (*info->print_address_func) (info->target, info);
-	      pd->print_addr = -1;
-	    }
+  /* There is a match.  */
+  if (matched_op != NULL)
+    {
+      (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+				    "%s", matched_op->name);
+      print_insn_args (matched_op->args, word, memaddr, info);
 
-	  /* Finish filling out insn_info fields.  */
-	  switch (op->pinfo & INSN_TYPE)
-	    {
-	    case INSN_BRANCH:
-	      info->insn_type = dis_branch;
-	      break;
-	    case INSN_CONDBRANCH:
-	      info->insn_type = dis_condbranch;
-	      break;
-	    case INSN_JSR:
-	      info->insn_type = dis_jsr;
-	      break;
-	    case INSN_DREF:
-	      info->insn_type = dis_dref;
-	      break;
-	    default:
-	      break;
-	    }
+      /* Try to disassemble multi-instruction addressing sequences.  */
+      if (pd->print_addr != (bfd_vma)-1)
+	{
+	  info->target = pd->print_addr;
+	  (*info->fprintf_styled_func)
+	    (info->stream, dis_style_comment_start, " # ");
+	  (*info->print_address_func) (info->target, info);
+	  pd->print_addr = -1;
+	}
 
-	  if (op->pinfo & INSN_DATA_SIZE)
-	    {
-	      int size = ((op->pinfo & INSN_DATA_SIZE)
-			  >> INSN_DATA_SIZE_SHIFT);
-	      info->data_size = 1 << (size - 1);
-	    }
+      /* Finish filling out insn_info fields.  */
+      switch (matched_op->pinfo & INSN_TYPE)
+	{
+	case INSN_BRANCH:
+	  info->insn_type = dis_branch;
+	  break;
+	case INSN_CONDBRANCH:
+	  info->insn_type = dis_condbranch;
+	  break;
+	case INSN_JSR:
+	  info->insn_type = dis_jsr;
+	  break;
+	case INSN_DREF:
+	  info->insn_type = dis_dref;
+	  break;
+	default:
+	  break;
+	}
 
-	  return insnlen;
+      if (matched_op->pinfo & INSN_DATA_SIZE)
+	{
+	  int size = ((matched_op->pinfo & INSN_DATA_SIZE)
+		      >> INSN_DATA_SIZE_SHIFT);
+	  info->data_size = 1 << (size - 1);
 	}
+
+      return insnlen;
     }
 
   /* We did not find a match, so just print the instruction bits.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/3] RISC-V: Implement "generic subsets" for disasm
  2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
  2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
@ 2022-06-03 12:05 ` Tsukasa OI
  2022-06-03 12:05 ` [PATCH 3/3] RISC-V: Add testcases disassembling zext.h Tsukasa OI
  2022-07-30  4:20 ` [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
  3 siblings, 0 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-06-03 12:05 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt; +Cc: binutils

There's a situation where an instruction is a specialized form of
another but they have different requirements (that can co-exist).  On
such cases, no-aliases option on diassembler will not work as expected.
This commit implements so called "generic subsets" to work with such
situation.

include/ChangeLog:

	* opcode/riscv.h (INSN_GENERICS): New.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_disassemble_insn): Implement so called
	"generic subsets" for disassembler.
	* riscv-opc.c (riscv_opcodes): Add INSN_GENERICS to zext.h,
	pack and packw instructions.
---
 include/opcode/riscv.h |  4 ++++
 opcodes/riscv-dis.c    | 18 +++++++++++++++---
 opcodes/riscv-opc.c    |  8 ++++----
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/include/opcode/riscv.h b/include/opcode/riscv.h
index f0beceaec44..b526dd8fc0d 100644
--- a/include/opcode/riscv.h
+++ b/include/opcode/riscv.h
@@ -438,6 +438,10 @@ struct riscv_opcode
 /* Instruction is a simple alias (e.g. "mv" for "addi").  */
 #define	INSN_ALIAS		0x00000001
 
+/* Instruction can be a simple instruction or an alias
+   depending on enabled ISA extensions. */
+#define INSN_GENERICS		0x00000100
+
 /* These are for setting insn_info fields.
 
    Nonbranch is the default.  Noninsn is used only if there is no match.
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 48858e61c38..74655a2acb5 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -576,7 +576,7 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
   static bool init = 0;
   static const struct riscv_opcode *riscv_hash[OP_MASK_OP + 1];
   struct riscv_private_data *pd;
-  int insnlen;
+  int insnlen, masklen = -1;
 
 #define OP_HASH_IDX(i) ((i) & (riscv_insn_length (i) == 2 ? 0x3 : OP_MASK_OP))
 
@@ -657,8 +657,20 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 	  if (!riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class))
 	    continue;
 
-	  matched_op = op;
-	  break;
+	  if (! (op->pinfo & INSN_GENERICS))
+	    {
+	      matched_op = op;
+	      break;
+	    }
+
+	  int curmasklen = __builtin_popcountll(op->mask);
+	  if (matched_op == NULL || (no_aliases
+				     ? (curmasklen < masklen)
+				     : (curmasklen >= masklen)))
+	    {
+	      matched_op = op;
+	      masklen = curmasklen;
+	    }
 	}
     }
 
diff --git a/opcodes/riscv-opc.c b/opcodes/riscv-opc.c
index 6355f8059f5..0eae9a788ed 100644
--- a/opcodes/riscv-opc.c
+++ b/opcodes/riscv-opc.c
@@ -933,8 +933,8 @@ const struct riscv_opcode riscv_opcodes[] =
 {"sext.b",     0, INSN_CLASS_I,         "d,s",   0, (int) M_SEXTB, match_never, INSN_MACRO },
 {"sext.h",     0, INSN_CLASS_ZBB,  "d,s",   MATCH_SEXT_H, MASK_SEXT_H, match_opcode, 0 },
 {"sext.h",     0, INSN_CLASS_I,         "d,s",   0, (int) M_SEXTH, match_never, INSN_MACRO },
-{"zext.h",    32, INSN_CLASS_ZBB,  "d,s",   MATCH_PACK, MASK_PACK | MASK_RS2, match_opcode, 0 },
-{"zext.h",    64, INSN_CLASS_ZBB,  "d,s",   MATCH_PACKW, MASK_PACKW | MASK_RS2, match_opcode, 0 },
+{"zext.h",    32, INSN_CLASS_ZBB,  "d,s",   MATCH_PACK, MASK_PACK | MASK_RS2, match_opcode, INSN_GENERICS },
+{"zext.h",    64, INSN_CLASS_ZBB,  "d,s",   MATCH_PACKW, MASK_PACKW | MASK_RS2, match_opcode, INSN_GENERICS },
 {"zext.h",     0, INSN_CLASS_I,         "d,s",   0, (int) M_ZEXTH, match_never, INSN_MACRO },
 {"orc.b",      0, INSN_CLASS_ZBB,  "d,s",   MATCH_GORCI | MATCH_SHAMT_ORC_B, MASK_GORCI | MASK_SHAMT, match_opcode, 0 },
 {"clzw",      64, INSN_CLASS_ZBB,  "d,s",   MATCH_CLZW, MASK_CLZW, match_opcode, 0 },
@@ -944,9 +944,9 @@ const struct riscv_opcode riscv_opcodes[] =
 {"brev8",     64, INSN_CLASS_ZBKB,  "d,s",      MATCH_GREVI | MATCH_SHAMT_BREV8, MASK_GREVI | MASK_SHAMT, match_opcode, 0 },
 {"zip",       32, INSN_CLASS_ZBKB,  "d,s",      MATCH_SHFLI|MATCH_SHAMT_ZIP_32, MASK_SHFLI|MASK_SHAMT, match_opcode, INSN_ALIAS },
 {"unzip",     32, INSN_CLASS_ZBKB,  "d,s",      MATCH_UNSHFLI|MATCH_SHAMT_ZIP_32, MASK_UNSHFLI|MASK_SHAMT, match_opcode, INSN_ALIAS },
-{"pack",       0, INSN_CLASS_ZBKB,  "d,s,t",    MATCH_PACK, MASK_PACK, match_opcode, 0 },
+{"pack",       0, INSN_CLASS_ZBKB,  "d,s,t",    MATCH_PACK, MASK_PACK, match_opcode, INSN_GENERICS },
 {"packh",      0, INSN_CLASS_ZBKB,  "d,s,t",    MATCH_PACKH, MASK_PACKH, match_opcode, 0 },
-{"packw",     64, INSN_CLASS_ZBKB,  "d,s,t",    MATCH_PACKW, MASK_PACKW, match_opcode, 0 },
+{"packw",     64, INSN_CLASS_ZBKB,  "d,s,t",    MATCH_PACKW, MASK_PACKW, match_opcode, INSN_GENERICS },
 {"andn",       0, INSN_CLASS_ZBB_OR_ZBKB,  "d,s,t", MATCH_ANDN, MASK_ANDN, match_opcode, 0 },
 {"orn",        0, INSN_CLASS_ZBB_OR_ZBKB,  "d,s,t", MATCH_ORN, MASK_ORN, match_opcode, 0 },
 {"xnor",       0, INSN_CLASS_ZBB_OR_ZBKB,  "d,s,t", MATCH_XNOR, MASK_XNOR, match_opcode, 0 },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/3] RISC-V: Add testcases disassembling zext.h
  2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
  2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
  2022-06-03 12:05 ` [PATCH 2/3] RISC-V: Implement "generic subsets" for disasm Tsukasa OI
@ 2022-06-03 12:05 ` Tsukasa OI
  2022-07-30  4:20 ` [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
  3 siblings, 0 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-06-03 12:05 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt; +Cc: binutils

This commit adds testcases for zext.h instruction (Zbb extension) on
diassembler.

gas/ChangeLog:

	* testsuite/gas/riscv/zbb-zext.h-dis-1.s: New test.
	* testsuite/gas/riscv/zbb-zext.h-dis-2.s: Likewise.
	* testsuite/gas/riscv/zbb-zext.h-dis-1-1.d: Likewise.
	* testsuite/gas/riscv/zbb-zext.h-dis-1-2.d: Likewise.
	* testsuite/gas/riscv/zbb-zext.h-dis-1-3.d: Likewise.
	* testsuite/gas/riscv/zbb-zext.h-dis-2.d: Likewise.
---
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++++++++++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++++++++++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++++++++++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s   |  2 ++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d   | 11 +++++++++++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s   |  4 ++++
 6 files changed, 50 insertions(+)
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s

diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
new file mode 100644
index 00000000000..13e3d610342
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
@@ -0,0 +1,11 @@
+#as: -march=rv32i_zbb_zbkb
+#source: zbb-zext.h-dis-1.s
+#objdump: -d
+
+.*:[ 	]+file format .*
+
+
+Disassembly of section .text:
+
+0+000 <target>:
+[ 	]+[0-9a-f]+:[ 	]+0805c533[ 	]+zext.h[ 	]+a0,a1
diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
new file mode 100644
index 00000000000..e9218a8e3d2
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
@@ -0,0 +1,11 @@
+#as: -march=rv32i_zbb_zbkb
+#source: zbb-zext.h-dis-1.s
+#objdump: -d -M no-aliases
+
+.*:[ 	]+file format .*
+
+
+Disassembly of section .text:
+
+0+000 <target>:
+[ 	]+[0-9a-f]+:[ 	]+0805c533[ 	]+pack[ 	]+a0,a1,zero
diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
new file mode 100644
index 00000000000..629da4ff984
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
@@ -0,0 +1,11 @@
+#as: -march=rv32i_zbb
+#source: zbb-zext.h-dis-1.s
+#objdump: -d -M no-aliases
+
+.*:[ 	]+file format .*
+
+
+Disassembly of section .text:
+
+0+000 <target>:
+[ 	]+[0-9a-f]+:[ 	]+0805c533[ 	]+zext.h[ 	]+a0,a1
diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
new file mode 100644
index 00000000000..ab04765dd5d
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
@@ -0,0 +1,2 @@
+target:
+	zext.h	a0, a1
diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d b/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
new file mode 100644
index 00000000000..3b1ee9de01e
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
@@ -0,0 +1,11 @@
+#as: -march=rv32i_zbkb
+#source: zbb-zext.h-dis-2.s
+#objdump: -d
+
+.*:[ 	]+file format .*
+
+
+Disassembly of section .text:
+
+0+000 <target>:
+[ 	]+[0-9a-f]+:[ 	]+0805c533[ 	]+pack[ 	]+a0,a1,zero
diff --git a/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s b/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s
new file mode 100644
index 00000000000..88067d3e87d
--- /dev/null
+++ b/gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s
@@ -0,0 +1,4 @@
+target:
+	.option	arch, +zbb
+	zext.h	a0, a1
+	.option	arch, -zbb
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler
  2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
                   ` (2 preceding siblings ...)
  2022-06-03 12:05 ` [PATCH 3/3] RISC-V: Add testcases disassembling zext.h Tsukasa OI
@ 2022-07-30  4:20 ` Tsukasa OI
  3 siblings, 0 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-07-30  4:20 UTC (permalink / raw)
  To: Kito Cheng, Palmer Dabbelt, Nelson Chu; +Cc: binutils

This patch is postponed because I will merge this into a larger patchset
with general core disassembler improvements.

Thanks,
Tsukasa

On 2022/06/03 21:05, Tsukasa OI wrote:
> Hello,
> 
> I noticed another disassembler issue and this is a fix.
> 
> 
> As I submitted in my previous patchset:
> <https://sourceware.org/pipermail/binutils/2022-June/121159.html>,
> there is a problem while disassembling zext.h instruction on
> RV{32,64}_Zbb_Zbkb configuration.
> 
> On RV32, zext.h on Zbb is a specialized form of pack (Zbkb).
> On RV64, zext.h on Zbb is a specialized form of packw (Zbkb).
> 
> Note that, if Zbkb extension is disabled, zext.h is a single
> instruction with no "generalized" forms.
> 
> 
> When **both** Zbb and Zbkb extensions are enabled and a zext.h
> instruction is disassembled with `-M no-aliases', it should print
> a non-alias instruction (that is, either pack or packw).
> 
>     # Input source file
>     _start:
>         zext.h a0, a1   # Specialized form on Zbb
> 
>     # Expected (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	pack	a0,a1,zero
> 
> However, it prints an alias.
> 
>     # Actual (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	zext.h	a0,a1
> 
> Note that, depending on -march, an "alias" is no longer an alias
> (as I noted earlier).
> 
>     # Expected/Actual (RV32_Zbb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	zext.h	a0,a1
> 
> 
> In general, this kind of issue occurs when:
> 
> 1.  There are two instructions
>     (one of them is a specialized form of another).
> 2.  Those requirements are different (separate) but can co-exist.
> 
> Because of 2., both instructions cannot be simple aliases (INSN_ALIAS
> cannot be used).  However on non-alias instructions, if a match is
> found, riscv_disassemble_insn thinks this is it and quits too early.
> 
> Because zext.h is declared before pack and packw, generalized forms are
> not matched for zext.h.
> 
> 
> 
> 
> As a solution, I propose a new concept: generic subsets.
> 
> For instructions with INSN_GENERICS, opcode matching cannot early-quit.
> Instead it searches an instruction with:
> 
> -   Longest mask (by default; when -M no-aliases is NOT specified)
> -   Shortest mask (when -M no-aliases is specified)
> 
> "Length" of the mask equals population count.  More one bits on the mask
> means more specialized instruction form.
> 
> It fixes disassembler on following instructions and configurations:
> 
> -   zext.h  (Zbb)       <--> pack       (Zbkb; RV32)
> -   zext.h  (Zbb)       <--> packw      (Zbkb; RV64)
> 
> Note that INSN_GENERICS (new flag in PATCH 2) must be set **both** on
> specialized and generic forms.  In the example above, those instructions
> require INSN_GENERICS:
> 
> -   zext.h (two XLEN-specific forms)
> -   pack
> -   packw
> 
> This concept can be used to following instruction pairs where the same
> issue can occur once non-ratified instruction is ratified.
> 
> -   orc.b   (Zbb)       <--> gorci      (NOT RATIFIED)
> -   brev8   (Zbkb)      <--> grevi      (NOT RATIFIED)
> -   zip     (Zbkb)      <--> shfli      (NOT RATIFIED)
> -   unzip   (Zbkb)      <--> unshfli    (NOT RATIFIED)
> -   rev8    (Zbb/Zbkb)  <--> grevi      (NOT RATIFIED)
> 
> 
> 
> 
> PATCH 1: Reorganize riscv_disassemble_insn function
> 
> riscv_disassemble_insn function (before PATCH 1) works like this:
> 
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         //
>         // Print insn with matched opcode.
>         //
>         return insnlen;
>     }
>     //
>     // opcode not matched.
>     //
> 
> Because instruction printing code is in the opcode loop, this is not
> good to fix the problem.  This patch reorganizes the function like this:
> 
>     matched_opcode = NULL;
>     // Step 1: opcode matching
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         matched_opcode = opcode;
>         break;
>     }
>     // Step 2: opcode printing
>     if (matched_opcode != NULL)
>     {
>         //
>         // Print insn with matched_opcode.
>         //
>         return insnlen;
>     }
>     //
>     // opcode not matched.
>     //
> 
> PATCH 1 splits opcode loop to two separate steps: opcode matching and
> printing.  With this, we can modify opcode matching step to implement
> "generic subsets".
> 
> 
> 
> 
> PATCH 2: Implement "generic subsets"
> 
> With this commit, riscv_disassemble_insn function works like this:
> 
>     int masklen;
>     matched_opcode = NULL;
>     // Step 1: opcode matching
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         if (!(opcode.pinfo & INSN_GENERICS))
>         {
>             matched_opcode = opcode;
>             break;  // Early-quit as before (for non generic subsets)
>         }
>         // Handling of generic subsets
>         if (matched_opcode == NULL)
>         {
>             matched_opcode = opcode;
>             // Set corresponding masklen.
>         }
>         else if (no_aliases)
>         {
>             // Prefer opcode with shortest mask.
>             // Update match_opcode and masklen where necessary.
>         }
>         else
>         {
>             // Prefer opcode with longest mask.
>             // Update match_opcode and masklen where necessary.
>         }
>     }
>     // Step 2: opcode printing (continues...)
> 
> PATCH 2 also adds INSN_GENERICS flag to two zext.h forms and pack/packw
> instructions.
> 
> 
> 
> 
> PATCH 3 contains additional testcases.
> 
> 
> 
> 
> Thanks,
> Tsukasa
> 
> 
> 
> 
> Tsukasa OI (3):
>   RISC-V: Split disasm opcode matching and printing
>   RISC-V: Implement "generic subsets" for disasm
>   RISC-V: Add testcases disassembling zext.h
> 
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s   |  2 +
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d   | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s   |  4 +
>  include/opcode/riscv.h                       |  4 +
>  opcodes/riscv-dis.c                          | 93 ++++++++++++--------
>  opcodes/riscv-opc.c                          |  8 +-
>  9 files changed, 114 insertions(+), 41 deletions(-)
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s
> 
> 
> base-commit: 625b6eae091709b95471eae92d42dc6bc71e6553

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-30  4:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
2022-06-03 12:05 ` [PATCH 2/3] RISC-V: Implement "generic subsets" for disasm Tsukasa OI
2022-06-03 12:05 ` [PATCH 3/3] RISC-V: Add testcases disassembling zext.h Tsukasa OI
2022-07-30  4:20 ` [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).