public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching)
@ 2022-11-20  1:08 Tsukasa OI
  2022-11-20  1:08 ` [PATCH 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-20  1:08 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

Hello,

This is the Part 3 of 4-part project to improve disassembler performance
drastically:
<https://github.com/a4lg/binutils-gdb/wiki/proj_dis_perf_improvements_1>

** this patchset does not apply to master directly. **

This patchset requires following patchset(s) to be applied first:
<https://sourceware.org/pipermail/binutils/2022-November/124378.html>


PATCH 1/3 improves performance on disassembling RISC-V code (which may also
possibly contain invalid data).  It replaces riscv_hash (on opcodes/
riscv-dis.c) with much faster data structure: sorted and partitioned
hash table.

This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and the author simplified the algorithm even further.
Unlike SPARC, RISC-V's hashed opcode table is not a table to linked lists,
it's just a table, pointing "start" elements in the sorted opcode list
(per hash code) and a global tail.

PATCH 3/3 takes care of per-instruction instruction support probing problem.
By caching which instruction classes are queried already, we no longer have
to call riscv_multi_subset_supports function for every instruction.  It
speeds up the disassembling even further.

PATCH 2/3 is not a part of the optimization but a safety net to complement
PATCH 1/3.  It enables implementing custom instructions that span through
multiple major opcodes (such as both CUSTOM_0 and CUSTOM_1 **in a single
instruction**) without causing disassembler functionality problems.  Note
that it has a big performance penalty if a vendor implements such
instruction so if such instruction is implemented in the mainline, a
separate solution will be required.


I benchmarked some of the programs and I usually get 20-50% performance
improvements while disassembling code section of compiled RISC-V ELF
programs ("objdump -d $FILE").  That is significant and pretty nice for such
a small modification (with about 12KB heap memory allocation on 64-bit
environment).  On libraries and big programs with many debug symbols, the
improvements are not that high but this is to be taken care with the next
part (the mapping symbol optimization).

This is not the end.

This structure significantly improves plain binary file handling (on
objdump, "objdump -b binary -m riscv:rv[32|64] -D $FILE").  I tested on
various binary files including random one and big vmlinux images and I
confirmed significant performance improvements (over 70% on many cases).

This is partially due to the fact that, disassembling about one quarter of
invalid "instruction" words required iterating over one thousand opcode
entries (348 or more being vector instructions with OP-V, that can be easily
skipped with this new data structure).  Another reason for this significance
is it doesn't have various ELF overhead.

It also has a great synergy with the commit "RISC-V: One time CSR hash table
initialization" and disassembling many CSR instructions is now over 6 times
faster (in contrast to only about 30% faster at the patchset part 2).


Thanks,
Tsukasa




Tsukasa OI (3):
  RISC-V: Use faster hash table on disassembling
  RISC-V: Fallback on faster hash table
  RISC-V: Cache instruction support

 include/opcode/riscv.h |   2 +
 opcodes/riscv-dis.c    | 129 +++++++++++++++++++++++++++++++++++------
 2 files changed, 113 insertions(+), 18 deletions(-)


base-commit: f3fcf98b44621fb8768cf11121d3fd77089bca5b
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] RISC-V: Use faster hash table on disassembling
  2022-11-20  1:08 [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
@ 2022-11-20  1:08 ` Tsukasa OI
  2022-11-20  1:08 ` [PATCH 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-20  1:08 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

This commit improves performance on disassembling RISC-V code.
It replaces riscv_hash (in opcodes/riscv-dis.c) with much faster data
structure: a sorted and partitioned hash table.

This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and the author simplified the algorithm even further.
Unlike SPARC, RISC-V's hashed opcode table is not a table to linked lists,
it's just a table, pointing "start" elements in the sorted opcode list
(per hash code) and a global tail.

It is expected to have 20-40% performance improvements when disassembling
linked RISC-V ELF programs using objdump.  That is a significant improvement
and pretty nice for such a small modification
(with about 12KB heap memory allocation on 64-bit environment).

This is not the end.  This structure significantly improves plain binary
file handling (on objdump, "objdump -b binary -m riscv:rv[32|64] -D $FILE").

The author tested on various binary files including random one and big
vmlinux images and confirmed significant performance improvements (>70%
on many cases).  This is partially due to the fact that, disassembling about
one quarter of invalid "instruction" words required iterating over one
thousand opcode entries (348 or more being vector instructions with OP-V,
that can be easily skipped with this new data structure).  Another reason
for this significance is it doesn't have various ELF overhead.

opcodes/ChangeLog:

	* riscv-dis.c (init_riscv_dis_state_for_arch_and_options): Build
	the hash table on the first run.
	(OP_HASH_LEN): Move from riscv_disassemble_insn.
	(OP_HASH_IDX): Move from riscv_disassemble_insn and mask by
	OP_MASK_OP2 == 0x03 for only real 16-bit instructions.
	(riscv_hash): New sorted and partitioned hash table.
	(riscv_opcodes_sorted): New sorted opcode table.
	(compare_opcodes): New function to compare RISC-V opcode entries.
	(build_riscv_opcodes_hash_table): New function to build faster
	hash table to disassemble.
	(riscv_disassemble_insn): Use sorted and partitioned hash table.
---
 opcodes/riscv-dis.c | 93 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 328a34501549..a4a74e5733a5 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -162,6 +162,8 @@ set_riscv_dis_arch_context (riscv_dis_arch_context_t *context,
 }
 \f
 
+static void build_riscv_opcodes_hash_table (void);
+
 /* Guess and update current XLEN.  */
 
 static void
@@ -205,6 +207,12 @@ init_riscv_dis_state_for_arch (void)
 static void
 init_riscv_dis_state_for_arch_and_options (void)
 {
+  static bool init = false;
+  if (!init)
+    {
+      build_riscv_opcodes_hash_table ();
+      init = true;
+    }
   /* If the architecture string is changed, update XLEN.  */
   if (is_arch_changed)
     update_riscv_dis_xlen (NULL);
@@ -818,6 +826,69 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
     }
 }
 
+/* Build a hash table for the disassembler to shorten the search time.
+   We sort riscv_opcodes entry pointers for further performance.
+   Hash index is computed by masking the instruction with...
+   - 0x03 (OP_MASK_OP2) for real 16-bit instructions
+   - 0x7f (OP_MASK_OP)  for all other instructions.  */
+
+#define OP_HASH_LEN (OP_MASK_OP + 1)
+#define OP_HASH_IDX(i)                                                        \
+  ((i) & (((i & OP_MASK_OP2) != OP_MASK_OP2) ? OP_MASK_OP2 : OP_MASK_OP))
+static const struct riscv_opcode **riscv_hash[OP_HASH_LEN + 1];
+static const struct riscv_opcode **riscv_opcodes_sorted;
+
+/* Compare two riscv_opcode* objects to sort by hash index.  */
+
+static int
+compare_opcodes (const void *ap, const void *bp)
+{
+  const struct riscv_opcode *a = *(const struct riscv_opcode **) ap;
+  const struct riscv_opcode *b = *(const struct riscv_opcode **) bp;
+  int ai = (int) OP_HASH_IDX (a->match);
+  int bi = (int) OP_HASH_IDX (b->match);
+  if (ai != bi)
+    return ai - bi;
+  /* Stable sort (on riscv_opcodes entry order) is required.  */
+  if (a < b)
+    return -1;
+  if (a > b)
+    return +1;
+  return 0;
+}
+
+/* Build riscv_opcodes-based hash table.  */
+
+static void
+build_riscv_opcodes_hash_table (void)
+{
+  const struct riscv_opcode *op;
+  const struct riscv_opcode **pop, **pop_end;
+  size_t len = 0;
+
+  /* Sort riscv_opcodes entry pointers (except macros).  */
+  for (op = riscv_opcodes; op->name; op++)
+    if (op->pinfo != INSN_MACRO)
+      len++;
+  riscv_opcodes_sorted = xcalloc (len, sizeof (struct riscv_opcode *));
+  pop_end = riscv_opcodes_sorted;
+  for (op = riscv_opcodes; op->name; op++)
+    if (op->pinfo != INSN_MACRO)
+      *pop_end++ = op;
+  qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
+	 compare_opcodes);
+
+  /* Initialize faster hash table.  */
+  pop = riscv_opcodes_sorted;
+  for (unsigned i = 0; i < OP_HASH_LEN; i++)
+    {
+      riscv_hash[i] = pop;
+      while (pop != pop_end && OP_HASH_IDX ((*pop)->match) == i)
+	pop++;
+    }
+  riscv_hash[OP_HASH_LEN] = pop_end;
+}
+
 /* Print the RISC-V instruction at address MEMADDR in debugged memory,
    on using INFO.  Returns length of the instruction, in bytes.
    BIGENDIAN must be 1 if this is big-endian code, 0 if
@@ -826,24 +897,11 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 static int
 riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 {
+  const struct riscv_opcode **pop, **pop_end;
   const struct riscv_opcode *op, *matched_op;
-  static bool init = false;
-  static const struct riscv_opcode *riscv_hash[OP_MASK_OP + 1];
   struct riscv_private_data *pd = info->private_data;
   int insnlen;
 
-#define OP_HASH_IDX(i) ((i) & (riscv_insn_length (i) == 2 ? 0x3 : OP_MASK_OP))
-
-  /* Build a hash table to shorten the search time.  */
-  if (! init)
-    {
-      for (op = riscv_opcodes; op->name; op++)
-	if (!riscv_hash[OP_HASH_IDX (op->match)])
-	  riscv_hash[OP_HASH_IDX (op->match)] = op;
-
-      init = true;
-    }
-
   insnlen = riscv_insn_length (word);
 
   /* RISC-V instructions are always little-endian.  */
@@ -861,10 +919,11 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
   info->target2 = 0;
 
   matched_op = NULL;
-  op = riscv_hash[OP_HASH_IDX (word)];
-
-  for (; op && op->name; op++)
+  pop     = riscv_hash[OP_HASH_IDX (word)];
+  pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+  for (; pop != pop_end; pop++)
     {
+      op = *pop;
       /* Does the opcode match?  */
       if (!(op->match_func) (op, word))
 	continue;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] RISC-V: Fallback on faster hash table
  2022-11-20  1:08 [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  2022-11-20  1:08 ` [PATCH 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
@ 2022-11-20  1:08 ` Tsukasa OI
  2022-11-20  1:08 ` [PATCH 3/3] RISC-V: Cache instruction support Tsukasa OI
  2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  3 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-20  1:08 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

Although it does not have a problem on current GNU Binutils implementation,
if the custom vendor implements an instruction which spans across multiple
major opcodes (e.g. uses both CUSTOM_0 and CUSTOM_1 in a *single* custom
instruction), the original assumption of the sorted hash table breaks.

In this case, this commit enables the fallback mode to disable all
optimizations except filtering macros out.

Note that, if a such instruction (that disables this disassembler
optimization) is upstreamed to Binutils, a separate solution will be
required to avoid major performance degradation when such instruction is
not used.  The intent of this commit is to make a room for custom vendors
to implement such instructions in *their* tree without causing
disassembler problems.

opcodes/ChangeLog:

	* riscv-dis.c (is_riscv_hash_fallback) New.
	(build_riscv_opcodes_hash_table): If an instruction spans across
	multiple major opcodes, enable fallback mode and disable sorting.
	(riscv_disassemble_insn): If the fallback mode is enabled, scan
	through all instructions instead of scanning only instruction
	entries matching the hash value.
---
 opcodes/riscv-dis.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index a4a74e5733a5..197f6a31d439 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -838,6 +838,9 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 static const struct riscv_opcode **riscv_hash[OP_HASH_LEN + 1];
 static const struct riscv_opcode **riscv_opcodes_sorted;
 
+/* Whether the fallback should be used.  */
+static bool is_riscv_hash_fallback = false;
+
 /* Compare two riscv_opcode* objects to sort by hash index.  */
 
 static int
@@ -868,15 +871,25 @@ build_riscv_opcodes_hash_table (void)
 
   /* Sort riscv_opcodes entry pointers (except macros).  */
   for (op = riscv_opcodes; op->name; op++)
-    if (op->pinfo != INSN_MACRO)
+    {
+      if (op->pinfo == INSN_MACRO)
+	continue;
       len++;
+      if (is_riscv_hash_fallback)
+	continue;
+      if (OP_HASH_IDX (op->match) < OP_MASK_OP2
+	      ? (op->mask & OP_MASK_OP2) != OP_MASK_OP2
+	      : (op->mask & OP_MASK_OP)  != OP_MASK_OP)
+	is_riscv_hash_fallback = true;
+    }
   riscv_opcodes_sorted = xcalloc (len, sizeof (struct riscv_opcode *));
   pop_end = riscv_opcodes_sorted;
   for (op = riscv_opcodes; op->name; op++)
     if (op->pinfo != INSN_MACRO)
       *pop_end++ = op;
-  qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
-	 compare_opcodes);
+  if (!is_riscv_hash_fallback)
+    qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
+	   compare_opcodes);
 
   /* Initialize faster hash table.  */
   pop = riscv_opcodes_sorted;
@@ -919,8 +932,16 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
   info->target2 = 0;
 
   matched_op = NULL;
-  pop     = riscv_hash[OP_HASH_IDX (word)];
-  pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+  if (!is_riscv_hash_fallback)
+    {
+      pop     = riscv_hash[OP_HASH_IDX (word)];
+      pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+    }
+  else
+    {
+      pop     = riscv_hash[0];
+      pop_end = riscv_hash[OP_HASH_LEN];
+    }
   for (; pop != pop_end; pop++)
     {
       op = *pop;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] RISC-V: Cache instruction support
  2022-11-20  1:08 [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  2022-11-20  1:08 ` [PATCH 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
  2022-11-20  1:08 ` [PATCH 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
@ 2022-11-20  1:08 ` Tsukasa OI
  2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  3 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-20  1:08 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

Calling riscv_subset_supports repeatedly harms the performance in a
measurable way (about 3-13% in total on the most cases).

As a simple solution, this commit now caches instruction class support
(whether specific instruction class is supported) as a signed char array.

It is expected to have 5-7% performance improvements when disassembling
linked RISC-V ELF programs using objdump but this is particularly effective
with programs with many CSR instructions (up to ~42% on the author's PC).

include/ChangeLog:

	* opcode/riscv.h (enum riscv_insn_class): Add NUM_INSN_CLASSES.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_insn_support_cache) New.
	(init_riscv_dis_state_for_arch): Clear the instruction support
	cache.  (riscv_disassemble_insn): Cache the instruction support.
---
 include/opcode/riscv.h |  2 ++
 opcodes/riscv-dis.c    | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/opcode/riscv.h b/include/opcode/riscv.h
index c3cbde600cb0..6a029a1034e1 100644
--- a/include/opcode/riscv.h
+++ b/include/opcode/riscv.h
@@ -422,6 +422,8 @@ enum riscv_insn_class
   INSN_CLASS_XTHEADMEMIDX,
   INSN_CLASS_XTHEADMEMPAIR,
   INSN_CLASS_XTHEADSYNC,
+
+  NUM_INSN_CLASSES,
 };
 
 /* This structure holds information for a particular instruction.  */
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 197f6a31d439..32e7b1174436 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -107,6 +107,9 @@ static bool no_aliases = false;
 
 /* If set, disassemble with numeric register names.  */
 static bool is_numeric = false;
+
+/* Instruction support cache.  */
+static signed char riscv_insn_support_cache[NUM_INSN_CLASSES];
 \f
 
 /* Set current disassembler context (dis_arch_context_current).
@@ -200,6 +203,9 @@ static void
 init_riscv_dis_state_for_arch (void)
 {
   is_arch_changed = true;
+  /* Clear instruction support cache.  */
+  for (size_t i = 0; i < NUM_INSN_CLASSES; i++)
+    riscv_insn_support_cache[i] = 0;
 }
 
 /* Initialization (for arch and options).  */
@@ -955,7 +961,14 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
       if ((op->xlen_requirement != 0) && (op->xlen_requirement != xlen))
 	continue;
       /* Is this instruction supported by the current architecture?  */
-      if (!riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class))
+      if (riscv_insn_support_cache[op->insn_class] == 0)
+	{
+	  riscv_insn_support_cache[op->insn_class]
+	      = riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class)
+		    ? +1
+		    : -1;
+	}
+      if (riscv_insn_support_cache[op->insn_class] < 0)
 	continue;
 
       matched_op = op;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching)
  2022-11-20  1:08 [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
                   ` (2 preceding siblings ...)
  2022-11-20  1:08 ` [PATCH 3/3] RISC-V: Cache instruction support Tsukasa OI
@ 2022-11-28  4:46 ` Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
                     ` (2 more replies)
  3 siblings, 3 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-28  4:46 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

Hello,

This is the Part 3 of 4-part project to improve disassembler performance
drastically:
<https://github.com/a4lg/binutils-gdb/wiki/proj_dis_perf_improvements_1>

** this patchset does not apply to master directly. **

This patchset requires following patchset(s) to be applied first:
<https://sourceware.org/pipermail/binutils/2022-November/124378.html>

[Changes: v1 -> v2]
-   Rebased against commit 97f006bc56af:
    "RISC-V: Better support for long instructions (disassembler)"


PATCH 1/3 improves performance on disassembling RISC-V code (which may also
possibly contain invalid data).  It replaces riscv_hash (on opcodes/
riscv-dis.c) with much faster data structure: sorted and partitioned
hash table.

This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and the author simplified the algorithm even further.
Unlike SPARC, RISC-V's hashed opcode table is not a table to linked lists,
it's just a table, pointing "start" elements in the sorted opcode list
(per hash code) and a global tail.

PATCH 3/3 takes care of per-instruction instruction support probing problem.
By caching which instruction classes are queried already, we no longer have
to call riscv_multi_subset_supports function for every instruction.  It
speeds up the disassembling even further.

PATCH 2/3 is not a part of the optimization but a safety net to complement
PATCH 1/3.  It enables implementing custom instructions that span through
multiple major opcodes (such as both CUSTOM_0 and CUSTOM_1 **in a single
instruction**) without causing disassembler functionality problems.  Note
that it has a big performance penalty if a vendor implements such
instruction so if such instruction is implemented in the mainline, a
separate solution will be required.


I benchmarked some of the programs and I usually get 20-50% performance
improvements while disassembling code section of compiled RISC-V ELF
programs ("objdump -d $FILE").  That is significant and pretty nice for such
a small modification (with about 12KB heap memory allocation on 64-bit
environment).  On libraries and big programs with many debug symbols, the
improvements are not that high but this is to be taken care with the next
part (the mapping symbol optimization).

This is not the end.

This structure significantly improves plain binary file handling (on
objdump, "objdump -b binary -m riscv:rv[32|64] -D $FILE").  I tested on
various binary files including random one and big vmlinux images and I
confirmed significant performance improvements (over 70% on many cases).

This is partially due to the fact that, disassembling about one quarter of
invalid "instruction" words required iterating over one thousand opcode
entries (348 or more being vector instructions with OP-V, that can be easily
skipped with this new data structure).  Another reason for this significance
is it doesn't have various ELF overhead.

It also has a great synergy with the commit "RISC-V: One time CSR hash table
initialization" and disassembling many CSR instructions is now over 6 times
faster (in contrast to only about 30% faster at the patchset part 2).


Thanks,
Tsukasa




Tsukasa OI (3):
  RISC-V: Use faster hash table on disassembling
  RISC-V: Fallback on faster hash table
  RISC-V: Cache instruction support

 include/opcode/riscv.h |   2 +
 opcodes/riscv-dis.c    | 129 +++++++++++++++++++++++++++++++++++------
 2 files changed, 113 insertions(+), 18 deletions(-)


base-commit: 75d9e5b0cfebcce34c855e4c98a956de4d7d7753
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] RISC-V: Use faster hash table on disassembling
  2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
@ 2022-11-28  4:46   ` Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 3/3] RISC-V: Cache instruction support Tsukasa OI
  2 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-28  4:46 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

This commit improves performance on disassembling RISC-V code.
It replaces riscv_hash (in opcodes/riscv-dis.c) with much faster data
structure: a sorted and partitioned hash table.

This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and the author simplified the algorithm even further.
Unlike SPARC, RISC-V's hashed opcode table is not a table to linked lists,
it's just a table, pointing "start" elements in the sorted opcode list
(per hash code) and a global tail.

It is expected to have 20-40% performance improvements when disassembling
linked RISC-V ELF programs using objdump.  That is a significant improvement
and pretty nice for such a small modification
(with about 12KB heap memory allocation on 64-bit environment).

This is not the end.  This structure significantly improves plain binary
file handling (on objdump, "objdump -b binary -m riscv:rv[32|64] -D $FILE").

The author tested on various binary files including random one and big
vmlinux images and confirmed significant performance improvements (>70%
on many cases).  This is partially due to the fact that, disassembling about
one quarter of invalid "instruction" words required iterating over one
thousand opcode entries (348 or more being vector instructions with OP-V,
that can be easily skipped with this new data structure).  Another reason
for this significance is it doesn't have various ELF overhead.

opcodes/ChangeLog:

	* riscv-dis.c (init_riscv_dis_state_for_arch_and_options): Build
	the hash table on the first run.
	(OP_HASH_LEN): Move from riscv_disassemble_insn.
	(OP_HASH_IDX): Move from riscv_disassemble_insn and mask by
	OP_MASK_OP2 == 0x03 for only real 16-bit instructions.
	(riscv_hash): New sorted and partitioned hash table.
	(riscv_opcodes_sorted): New sorted opcode table.
	(compare_opcodes): New function to compare RISC-V opcode entries.
	(build_riscv_opcodes_hash_table): New function to build faster
	hash table to disassemble.
	(riscv_disassemble_insn): Use sorted and partitioned hash table.
---
 opcodes/riscv-dis.c | 93 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index b9309c092b17..4267b3ccf88c 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -162,6 +162,8 @@ set_riscv_dis_arch_context (riscv_dis_arch_context_t *context,
 }
 \f
 
+static void build_riscv_opcodes_hash_table (void);
+
 /* Guess and update current XLEN.  */
 
 static void
@@ -205,6 +207,12 @@ init_riscv_dis_state_for_arch (void)
 static void
 init_riscv_dis_state_for_arch_and_options (void)
 {
+  static bool init = false;
+  if (!init)
+    {
+      build_riscv_opcodes_hash_table ();
+      init = true;
+    }
   /* If the architecture string is changed, update XLEN.  */
   if (is_arch_changed)
     update_riscv_dis_xlen (NULL);
@@ -818,6 +826,69 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
     }
 }
 
+/* Build a hash table for the disassembler to shorten the search time.
+   We sort riscv_opcodes entry pointers for further performance.
+   Hash index is computed by masking the instruction with...
+   - 0x03 (OP_MASK_OP2) for real 16-bit instructions
+   - 0x7f (OP_MASK_OP)  for all other instructions.  */
+
+#define OP_HASH_LEN (OP_MASK_OP + 1)
+#define OP_HASH_IDX(i)                                                        \
+  ((i) & (((i & OP_MASK_OP2) != OP_MASK_OP2) ? OP_MASK_OP2 : OP_MASK_OP))
+static const struct riscv_opcode **riscv_hash[OP_HASH_LEN + 1];
+static const struct riscv_opcode **riscv_opcodes_sorted;
+
+/* Compare two riscv_opcode* objects to sort by hash index.  */
+
+static int
+compare_opcodes (const void *ap, const void *bp)
+{
+  const struct riscv_opcode *a = *(const struct riscv_opcode **) ap;
+  const struct riscv_opcode *b = *(const struct riscv_opcode **) bp;
+  int ai = (int) OP_HASH_IDX (a->match);
+  int bi = (int) OP_HASH_IDX (b->match);
+  if (ai != bi)
+    return ai - bi;
+  /* Stable sort (on riscv_opcodes entry order) is required.  */
+  if (a < b)
+    return -1;
+  if (a > b)
+    return +1;
+  return 0;
+}
+
+/* Build riscv_opcodes-based hash table.  */
+
+static void
+build_riscv_opcodes_hash_table (void)
+{
+  const struct riscv_opcode *op;
+  const struct riscv_opcode **pop, **pop_end;
+  size_t len = 0;
+
+  /* Sort riscv_opcodes entry pointers (except macros).  */
+  for (op = riscv_opcodes; op->name; op++)
+    if (op->pinfo != INSN_MACRO)
+      len++;
+  riscv_opcodes_sorted = xcalloc (len, sizeof (struct riscv_opcode *));
+  pop_end = riscv_opcodes_sorted;
+  for (op = riscv_opcodes; op->name; op++)
+    if (op->pinfo != INSN_MACRO)
+      *pop_end++ = op;
+  qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
+	 compare_opcodes);
+
+  /* Initialize faster hash table.  */
+  pop = riscv_opcodes_sorted;
+  for (unsigned i = 0; i < OP_HASH_LEN; i++)
+    {
+      riscv_hash[i] = pop;
+      while (pop != pop_end && OP_HASH_IDX ((*pop)->match) == i)
+	pop++;
+    }
+  riscv_hash[OP_HASH_LEN] = pop_end;
+}
+
 /* Print the RISC-V instruction at address MEMADDR in debugged memory,
    on using INFO.  Returns length of the instruction, in bytes.
    BIGENDIAN must be 1 if this is big-endian code, 0 if
@@ -829,24 +900,11 @@ riscv_disassemble_insn (bfd_vma memaddr,
 			const bfd_byte *packet,
 			disassemble_info *info)
 {
+  const struct riscv_opcode **pop, **pop_end;
   const struct riscv_opcode *op, *matched_op;
-  static bool init = false;
-  static const struct riscv_opcode *riscv_hash[OP_MASK_OP + 1];
   struct riscv_private_data *pd = info->private_data;
   int insnlen;
 
-#define OP_HASH_IDX(i) ((i) & (riscv_insn_length (i) == 2 ? 0x3 : OP_MASK_OP))
-
-  /* Build a hash table to shorten the search time.  */
-  if (! init)
-    {
-      for (op = riscv_opcodes; op->name; op++)
-	if (!riscv_hash[OP_HASH_IDX (op->match)])
-	  riscv_hash[OP_HASH_IDX (op->match)] = op;
-
-      init = true;
-    }
-
   insnlen = riscv_insn_length (word);
 
   /* RISC-V instructions are always little-endian.  */
@@ -864,10 +922,11 @@ riscv_disassemble_insn (bfd_vma memaddr,
   info->target2 = 0;
 
   matched_op = NULL;
-  op = riscv_hash[OP_HASH_IDX (word)];
-
-  for (; op && op->name; op++)
+  pop     = riscv_hash[OP_HASH_IDX (word)];
+  pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+  for (; pop != pop_end; pop++)
     {
+      op = *pop;
       /* Does the opcode match?  */
       if (!(op->match_func) (op, word))
 	continue;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] RISC-V: Fallback on faster hash table
  2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
@ 2022-11-28  4:46   ` Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 3/3] RISC-V: Cache instruction support Tsukasa OI
  2 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-28  4:46 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

Although it does not have a problem on current GNU Binutils implementation,
if the custom vendor implements an instruction which spans across multiple
major opcodes (e.g. uses both CUSTOM_0 and CUSTOM_1 in a *single* custom
instruction), the original assumption of the sorted hash table breaks.

In this case, this commit enables the fallback mode to disable all
optimizations except filtering macros out.

Note that, if a such instruction (that disables this disassembler
optimization) is upstreamed to Binutils, a separate solution will be
required to avoid major performance degradation when such instruction is
not used.  The intent of this commit is to make a room for custom vendors
to implement such instructions in *their* tree without causing
disassembler problems.

opcodes/ChangeLog:

	* riscv-dis.c (is_riscv_hash_fallback) New.
	(build_riscv_opcodes_hash_table): If an instruction spans across
	multiple major opcodes, enable fallback mode and disable sorting.
	(riscv_disassemble_insn): If the fallback mode is enabled, scan
	through all instructions instead of scanning only instruction
	entries matching the hash value.
---
 opcodes/riscv-dis.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 4267b3ccf88c..3f6cbf5a3680 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -838,6 +838,9 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 static const struct riscv_opcode **riscv_hash[OP_HASH_LEN + 1];
 static const struct riscv_opcode **riscv_opcodes_sorted;
 
+/* Whether the fallback should be used.  */
+static bool is_riscv_hash_fallback = false;
+
 /* Compare two riscv_opcode* objects to sort by hash index.  */
 
 static int
@@ -868,15 +871,25 @@ build_riscv_opcodes_hash_table (void)
 
   /* Sort riscv_opcodes entry pointers (except macros).  */
   for (op = riscv_opcodes; op->name; op++)
-    if (op->pinfo != INSN_MACRO)
+    {
+      if (op->pinfo == INSN_MACRO)
+	continue;
       len++;
+      if (is_riscv_hash_fallback)
+	continue;
+      if (OP_HASH_IDX (op->match) < OP_MASK_OP2
+	      ? (op->mask & OP_MASK_OP2) != OP_MASK_OP2
+	      : (op->mask & OP_MASK_OP)  != OP_MASK_OP)
+	is_riscv_hash_fallback = true;
+    }
   riscv_opcodes_sorted = xcalloc (len, sizeof (struct riscv_opcode *));
   pop_end = riscv_opcodes_sorted;
   for (op = riscv_opcodes; op->name; op++)
     if (op->pinfo != INSN_MACRO)
       *pop_end++ = op;
-  qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
-	 compare_opcodes);
+  if (!is_riscv_hash_fallback)
+    qsort (riscv_opcodes_sorted, len, sizeof (struct riscv_opcode *),
+	   compare_opcodes);
 
   /* Initialize faster hash table.  */
   pop = riscv_opcodes_sorted;
@@ -922,8 +935,16 @@ riscv_disassemble_insn (bfd_vma memaddr,
   info->target2 = 0;
 
   matched_op = NULL;
-  pop     = riscv_hash[OP_HASH_IDX (word)];
-  pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+  if (!is_riscv_hash_fallback)
+    {
+      pop     = riscv_hash[OP_HASH_IDX (word)];
+      pop_end = riscv_hash[OP_HASH_IDX (word) + 1];
+    }
+  else
+    {
+      pop     = riscv_hash[0];
+      pop_end = riscv_hash[OP_HASH_LEN];
+    }
   for (; pop != pop_end; pop++)
     {
       op = *pop;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] RISC-V: Cache instruction support
  2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
  2022-11-28  4:46   ` [PATCH v2 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
@ 2022-11-28  4:46   ` Tsukasa OI
  2 siblings, 0 replies; 8+ messages in thread
From: Tsukasa OI @ 2022-11-28  4:46 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

From: Tsukasa OI <research_trasio@irq.a4lg.com>

Calling riscv_subset_supports repeatedly harms the performance in a
measurable way (about 3-13% in total on the most cases).

As a simple solution, this commit now caches instruction class support
(whether specific instruction class is supported) as a signed char array.

It is expected to have 5-7% performance improvements when disassembling
linked RISC-V ELF programs using objdump but this is particularly effective
with programs with many CSR instructions (up to ~42% on the author's PC).

include/ChangeLog:

	* opcode/riscv.h (enum riscv_insn_class): Add NUM_INSN_CLASSES.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_insn_support_cache) New.
	(init_riscv_dis_state_for_arch): Clear the instruction support
	cache.  (riscv_disassemble_insn): Cache the instruction support.
---
 include/opcode/riscv.h |  2 ++
 opcodes/riscv-dis.c    | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/opcode/riscv.h b/include/opcode/riscv.h
index c3cbde600cb0..6a029a1034e1 100644
--- a/include/opcode/riscv.h
+++ b/include/opcode/riscv.h
@@ -422,6 +422,8 @@ enum riscv_insn_class
   INSN_CLASS_XTHEADMEMIDX,
   INSN_CLASS_XTHEADMEMPAIR,
   INSN_CLASS_XTHEADSYNC,
+
+  NUM_INSN_CLASSES,
 };
 
 /* This structure holds information for a particular instruction.  */
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 3f6cbf5a3680..eb3e64816bf6 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -107,6 +107,9 @@ static bool no_aliases = false;
 
 /* If set, disassemble with numeric register names.  */
 static bool is_numeric = false;
+
+/* Instruction support cache.  */
+static signed char riscv_insn_support_cache[NUM_INSN_CLASSES];
 \f
 
 /* Set current disassembler context (dis_arch_context_current).
@@ -200,6 +203,9 @@ static void
 init_riscv_dis_state_for_arch (void)
 {
   is_arch_changed = true;
+  /* Clear instruction support cache.  */
+  for (size_t i = 0; i < NUM_INSN_CLASSES; i++)
+    riscv_insn_support_cache[i] = 0;
 }
 
 /* Initialization (for arch and options).  */
@@ -958,7 +964,14 @@ riscv_disassemble_insn (bfd_vma memaddr,
       if ((op->xlen_requirement != 0) && (op->xlen_requirement != xlen))
 	continue;
       /* Is this instruction supported by the current architecture?  */
-      if (!riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class))
+      if (riscv_insn_support_cache[op->insn_class] == 0)
+	{
+	  riscv_insn_support_cache[op->insn_class]
+	      = riscv_multi_subset_supports (&riscv_rps_dis, op->insn_class)
+		    ? +1
+		    : -1;
+	}
+      if (riscv_insn_support_cache[op->insn_class] < 0)
 	continue;
 
       matched_op = op;
-- 
2.38.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-28  4:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20  1:08 [PATCH 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
2022-11-20  1:08 ` [PATCH 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
2022-11-20  1:08 ` [PATCH 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
2022-11-20  1:08 ` [PATCH 3/3] RISC-V: Cache instruction support Tsukasa OI
2022-11-28  4:46 ` [PATCH v2 0/3] RISC-V: Disassembler Core Optimization 1-1 (Hash table and Caching) Tsukasa OI
2022-11-28  4:46   ` [PATCH v2 1/3] RISC-V: Use faster hash table on disassembling Tsukasa OI
2022-11-28  4:46   ` [PATCH v2 2/3] RISC-V: Fallback on faster hash table Tsukasa OI
2022-11-28  4:46   ` [PATCH v2 3/3] RISC-V: Cache instruction support Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).