public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Tsukasa OI <research_trasio@irq.a4lg.com>
To: Tsukasa OI <research_trasio@irq.a4lg.com>,
	Nelson Chu <nelson@rivosinc.com>,
	Kito Cheng <kito.cheng@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>
Cc: binutils@sourceware.org
Subject: [PATCH v2 10/11] RISC-V: Reorganize arch-related initialization and management
Date: Mon, 28 Nov 2022 04:43:45 +0000	[thread overview]
Message-ID: <0e328130d90af42f30e3632fadd6c48e422b7b82.1669610611.git.research_trasio@irq.a4lg.com> (raw)
In-Reply-To: <cover.1669610611.git.research_trasio@irq.a4lg.com>

From: Tsukasa OI <research_trasio@irq.a4lg.com>

objdump reuses the disassembler function returned by the disassembler
function.  That's good and benefits well from various optimizations.

However, by default, GDB (default_print_insn in gdb/arch-utils.c) assumes
that the disassembler function logic is simple and calls that function for
every instruction to be disassembled.  This is clearly a waste of time
because it probes BFD (ELF information) and re-initializes
riscv_rps_dis for every instruction.

After the support of mapping symbol with ISA string with commit 40f1a1a4564b
("RISC-V: Output mapping symbols with ISA string."), this kind of per-
instruction initialization of riscv_rps_dis can also occur on ELF files with
mapping symbol + ISA string (in riscv_get_map_state function).  It can be
worse if the object is assembled using Binutils commit 0ce50fc900a5
("RISC-V: Always generate mapping symbols at the start of the sections.")
or later.

To avoid repeated initialization, this commit

-   Caches the default / override architectures (in two "contexts") and
-   enables switching between them.

riscv_dis_arch_context_t and new utility functions are defined for this
purpose.  We still have to read the ".riscv.attributes" section on every
instruction on GDB but at least the most time-consuming part (updating the
actual architecture from the architecture string) is avoided.
Likewise, we still have to probe whether we have to update the architecture
for every instruction with a mapping symbol with ISA string is suitable but
at least we don't actually update the architecture unless necessary
(either context itself or the ISA string of the current context is changed).

This commit improves the disassembler performance well in those situations:

-   When long "disas" command is used on GDB
-   When ELF files with mapping symbols + ISA string is used

This commit now implements new mapping symbol handling ("$x" means revert
to the previous architecture [with "$x+arch"] if exists, otherwise revert to
the default one usually read from an ELF attribute), the recent consensus
made by Kito and Nelson.

On the benchmark using GDB batch files, the author measured significant
performance improvements (35-96% on various big RISC-V programs).
Unfortunately, on interactive usecases of GDB, this improvement is rarely
observable since we don't usually disassemble such a big chunk at once and
the current disassembler is not very slow.

On the benchmark using unstripped ELF files with mapping symbols + ISA
string "$xrv...", performance improvements are significant and easily
observable in the real world (150%-264% performance improvments).

Aside from optimization, this commit, along with "RISC-V: Reorganize
disassembler state initialization", makes state initialization clearer and
makes further changes easier.

Also, although not practical in the real world, this commit now allows
multi-XLEN object disassembling if the object file has mapping symbols
with ISA string and the machine is XLEN-neutral (e.g. objdump with
"-m riscv" option).  It may help testing Binutils / GAS.

opcodes/ChangeLog:

	* riscv-dis.c
	(initial_default_arch): Special default architecture
	string which is handled separately.
	(riscv_dis_arch_context_t): New type to manage RISC-V architecture
	context for the disassembler.  Two instance of this type is
	defined in this file - "default" and "override".
	(dis_arch_context_default): New.  Architecture context inferred
	from either an ELF attribute or initial_default_arch.
	(dis_arch_context_override): New.  Architecture context inferred
	from mapping symbols with ISA string.
	(dis_arch_context_current): New.  A pointer to either
	dis_arch_context_default or dis_arch_context_override.
	(riscv_rps_dis): Add summary.  Use initial values from the initial
	value of dis_arch_context_current - dis_arch_context_default.
	(from_last_map_symbol): Make it file scope to decide whether we
	should revert the architecture to the default in
	riscv_get_map_state function.
	(set_riscv_current_dis_arch_context): New function to update
	riscv_rps_dis and dis_arch_context_current.
	(set_riscv_dis_arch_context): New function to update the
	architecture for the given context.
	(update_riscv_dis_xlen): Consider dis_arch_context_current->xlen
	when guessing correct XLEN.
	(is_arch_changed): New.  Set to true if the architecture is
	changed.
	(init_riscv_dis_state_for_arch): New function to track whether
	the architecture string is changed.
	(init_riscv_dis_state_for_arch_and_options): Keep track of the
	architecture string change and update XLEN if it has changed.
	(update_riscv_dis_arch): New function to set both the architecture
	and the context.  Call initialization functions if needed.
	(riscv_get_map_state): Add update argument.  Keep track of the
	mapping symbols with ISA string and update the architecture and
	the context if required.
	(riscv_search_mapping_symbol): Move from_last_map_symbol to
	file scope.  Call riscv_get_map_state function with architecture
	and context updates enabled.
	(riscv_data_length): Call riscv_get_map_state function with
	architecture and context updates disabled.
	(riscv_get_disassembler): Add an error handling on Tag_RISCV_arch.
	Call update_riscv_dis_arch function to update the architecture
	and the context.

Co-developed-by: Nelson Chu <nelson@rivosinc.com>
---
 opcodes/riscv-dis.c | 174 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 152 insertions(+), 22 deletions(-)

diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 174b6e180543..d48448bb3597 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -32,6 +32,9 @@
 #include <stdint.h>
 #include <ctype.h>
 
+/* Default architecture string (if not available).  */
+static const char *const initial_default_arch = "rv64gc";
+
 /* Current XLEN for the disassembler.  */
 static unsigned xlen = 0;
 
@@ -48,14 +51,36 @@ static enum riscv_spec_class default_isa_spec = ISA_SPEC_CLASS_DRAFT - 1;
    (as specified by the ELF attributes or the `priv-spec' option).  */
 static enum riscv_spec_class default_priv_spec = PRIV_SPEC_CLASS_NONE;
 
-static riscv_subset_list_t riscv_subsets;
+/* RISC-V disassembler architecture context type.  */
+typedef struct
+{
+  const char *arch_str;
+  const char *default_arch;
+  riscv_subset_list_t subsets;
+  unsigned xlen;
+  bool no_xlen_if_default;
+} riscv_dis_arch_context_t;
+
+/* Context: default (either initial_default_arch or ELF attribute).  */
+static riscv_dis_arch_context_t dis_arch_context_default
+    = { NULL, initial_default_arch, {}, 0, true };
+
+/* Context: override (mapping symbols with ISA string). */
+static riscv_dis_arch_context_t dis_arch_context_override
+    = { NULL, NULL, {}, 0, false };
+
+/* Pointer to the current disassembler architecture context.  */
+static riscv_dis_arch_context_t *dis_arch_context_current
+    = &dis_arch_context_default;
+
+/* RISC-V ISA string parser structure (current).  */
 static riscv_parse_subset_t riscv_rps_dis =
 {
-  &riscv_subsets,	/* subset_list.  */
-  opcodes_error_handler,/* error_handler.  */
-  &xlen,		/* xlen.  */
-  &default_isa_spec,	/* isa_spec.  */
-  false,		/* check_unknown_prefixed_ext.  */
+  &(dis_arch_context_default.subsets),	/* subset_list.  */
+  opcodes_error_handler,		/* error_handler.  */
+  &(dis_arch_context_default.xlen),	/* xlen.  */
+  &default_isa_spec,			/* isa_spec.  */
+  false,				/* check_unknown_prefixed_ext.  */
 };
 
 /* Private data structure for the RISC-V disassembler.  */
@@ -71,6 +96,7 @@ struct riscv_private_data
 /* Used for mapping symbols.  */
 static int last_map_symbol = -1;
 static bfd_vma last_stop_offset = 0;
+static bool from_last_map_symbol = false;
 
 /* Register names as used by the disassembler.  */
 static const char * const *riscv_gpr_names;
@@ -83,6 +109,59 @@ static bool no_aliases = false;
 static bool is_numeric = false;
 \f
 
+/* Set current disassembler context (dis_arch_context_current).
+   Return true if successfully updated.  */
+
+static bool
+set_riscv_current_dis_arch_context (riscv_dis_arch_context_t* context)
+{
+  if (context == dis_arch_context_current)
+    return false;
+  dis_arch_context_current = context;
+  riscv_rps_dis.subset_list = &(context->subsets);
+  riscv_rps_dis.xlen = &(context->xlen);
+  return true;
+}
+
+/* Update riscv_dis_arch_context_t by an ISA string.
+   Return true if the architecture is updated by arch.  */
+
+static bool
+set_riscv_dis_arch_context (riscv_dis_arch_context_t *context,
+			    const char *arch)
+{
+  /* Check whether the architecture is changed and
+     return false if the architecture will not be changed.  */
+  if (context->arch_str)
+    {
+      if (context->default_arch && arch == context->default_arch)
+	return false;
+      if (strcmp (context->arch_str, arch) == 0)
+	return false;
+    }
+  /* Update architecture string.  */
+  if (context->arch_str != context->default_arch)
+    free ((void *) context->arch_str);
+  context->arch_str = (arch != context->default_arch)
+			    ? xstrdup (arch)
+			    : context->default_arch;
+  /* Update other contents (subset list and XLEN).  */
+  riscv_subset_list_t *prev_subsets = riscv_rps_dis.subset_list;
+  unsigned *prev_xlen = riscv_rps_dis.xlen;
+  riscv_rps_dis.subset_list = &(context->subsets);
+  riscv_rps_dis.xlen = &(context->xlen);
+  context->xlen = 0;
+  riscv_release_subset_list (&context->subsets);
+  riscv_parse_subset (&riscv_rps_dis, context->arch_str);
+  riscv_rps_dis.subset_list = prev_subsets;
+  riscv_rps_dis.xlen = prev_xlen;
+  /* Special handling on the default architecture.  */
+  if (context->no_xlen_if_default && arch == context->default_arch)
+    context->xlen = 0;
+  return true;
+}
+\f
+
 /* Guess and update current XLEN.  */
 
 static void
@@ -95,9 +174,13 @@ update_riscv_dis_xlen (struct disassemble_info *info)
        This is only effective if XLEN-specific BFD machine architecture is
        chosen.  If XLEN-neutral (like riscv), BFD machine architecture is
        ignored on XLEN selection.
-     2. ELF class in dummy ELF header.  */
+     2. Non-default RISC-V architecture string set by either an ELF
+	attribute or a mapping symbol with ISA string.
+     3. ELF class in dummy ELF header.  */
   if (xlen_by_mach != 0)
     xlen = xlen_by_mach;
+  else if (dis_arch_context_current->xlen != 0)
+    xlen = dis_arch_context_current->xlen;
   else if (xlen_by_elf != 0)
     xlen = xlen_by_elf;
   else if (info != NULL && info->section != NULL)
@@ -107,11 +190,24 @@ update_riscv_dis_xlen (struct disassemble_info *info)
     }
 }
 
+/* Initialization (for arch).  */
+
+static bool is_arch_changed = false;
+
+static void
+init_riscv_dis_state_for_arch (void)
+{
+  is_arch_changed = true;
+}
+
 /* Initialization (for arch and options).  */
 
 static void
 init_riscv_dis_state_for_arch_and_options (void)
 {
+  /* If the architecture string is changed, update XLEN.  */
+  if (is_arch_changed)
+    update_riscv_dis_xlen (NULL);
   /* Set GPR register names to disassemble.  */
   riscv_gpr_names = is_numeric ? riscv_gpr_names_numeric : riscv_gpr_names_abi;
   /* Set FPR register names to disassemble.  */
@@ -119,6 +215,25 @@ init_riscv_dis_state_for_arch_and_options (void)
       = !riscv_subset_supports (&riscv_rps_dis, "zfinx")
 	    ? (is_numeric ? riscv_fpr_names_numeric : riscv_fpr_names_abi)
 	    : riscv_gpr_names;
+  /* Save previous options and mark them "unchanged".  */
+  is_arch_changed = false;
+}
+
+/* Update architecture for disassembler with its context.
+   Call initialization functions if either:
+   -  the architecture for current context is changed or
+   -  context is updated to a new one.  */
+
+static void
+update_riscv_dis_arch (riscv_dis_arch_context_t *context, const char *arch)
+{
+  if ((set_riscv_dis_arch_context (context, arch)
+       && dis_arch_context_current == context)
+      || set_riscv_current_dis_arch_context (context))
+    {
+      init_riscv_dis_state_for_arch ();
+      init_riscv_dis_state_for_arch_and_options ();
+    }
 }
 \f
 
@@ -874,7 +989,8 @@ riscv_get_map_state_by_name (const char *name, const char** arch)
 static bool
 riscv_get_map_state (int n,
 		     enum riscv_seg_mstate *state,
-		     struct disassemble_info *info)
+		     struct disassemble_info *info,
+		     bool update)
 {
   const char *name, *arch = NULL;
 
@@ -888,12 +1004,25 @@ riscv_get_map_state (int n,
   if (newstate == MAP_NONE)
     return false;
   *state = newstate;
-  if (arch)
-    {
-      riscv_release_subset_list (&riscv_subsets);
-      riscv_parse_subset (&riscv_rps_dis, arch);
-      init_riscv_dis_state_for_arch_and_options ();
-    }
+  if (newstate == MAP_INSN && update)
+  {
+    if (arch)
+      {
+	/* Override the architecture.  */
+	update_riscv_dis_arch (&dis_arch_context_override, arch);
+      }
+    else if (!from_last_map_symbol
+	     && set_riscv_current_dis_arch_context (&dis_arch_context_default))
+      {
+	/* Revert to the default architecture and call init functions if:
+	   - there's no ISA string in the mapping symbol,
+	   - mapping symbol is not reused and
+	   - current disassembler context is changed to the default one.
+	   This is a shortcut path to avoid full update_riscv_dis_arch.  */
+	init_riscv_dis_state_for_arch ();
+	init_riscv_dis_state_for_arch_and_options ();
+      }
+  }
   return true;
 }
 
@@ -905,7 +1034,6 @@ riscv_search_mapping_symbol (bfd_vma memaddr,
 			     struct disassemble_info *info)
 {
   enum riscv_seg_mstate mstate;
-  bool from_last_map_symbol;
   bool found = false;
   int symbol = -1;
   int n;
@@ -945,7 +1073,7 @@ riscv_search_mapping_symbol (bfd_vma memaddr,
       /* We have searched all possible symbols in the range.  */
       if (addr > memaddr)
 	break;
-      if (riscv_get_map_state (n, &mstate, info))
+      if (riscv_get_map_state (n, &mstate, info, true))
 	{
 	  symbol = n;
 	  found = true;
@@ -972,7 +1100,7 @@ riscv_search_mapping_symbol (bfd_vma memaddr,
 	  if (addr < (info->section ? info->section->vma : 0))
 	    break;
 	  /* Stop searching once we find the closed mapping symbol.  */
-	  if (riscv_get_map_state (n, &mstate, info))
+	  if (riscv_get_map_state (n, &mstate, info, true))
 	    {
 	      symbol = n;
 	      found = true;
@@ -1008,7 +1136,7 @@ riscv_data_length (bfd_vma memaddr,
 	{
 	  bfd_vma addr = bfd_asymbol_value (info->symtab[n]);
 	  if (addr > memaddr
-	      && riscv_get_map_state (n, &m, info))
+	      && riscv_get_map_state (n, &m, info, false))
 	    {
 	      if (addr - memaddr < length)
 		length = addr - memaddr;
@@ -1134,7 +1262,7 @@ print_insn_riscv (bfd_vma memaddr, struct disassemble_info *info)
 disassembler_ftype
 riscv_get_disassembler (bfd *abfd)
 {
-  const char *default_arch = "rv64gc";
+  const char *default_arch = initial_default_arch;
 
   if (abfd && bfd_get_flavour (abfd) == bfd_target_elf_flavour)
     {
@@ -1150,12 +1278,14 @@ riscv_get_disassembler (bfd *abfd)
 						  attr[Tag_c].i,
 						  &default_priv_spec);
 	  default_arch = attr[Tag_RISCV_arch].s;
+	  /* For ELF files with (somehow) no architecture string
+	     in the attributes, use the default value.  */
+	  if (!default_arch)
+	    default_arch = initial_default_arch;
 	}
     }
 
-  riscv_release_subset_list (&riscv_subsets);
-  riscv_parse_subset (&riscv_rps_dis, default_arch);
-  init_riscv_dis_state_for_arch_and_options ();
+  update_riscv_dis_arch (&dis_arch_context_default, default_arch);
   return print_insn_riscv;
 }
 
-- 
2.38.1


  parent reply	other threads:[~2022-11-28  4:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  4:52 [PATCH 00/11] RISC-V: Requirements for disassembler optimizations batch 1 Tsukasa OI
2022-11-15  4:52 ` [PATCH 01/11] opcodes/riscv-dis.c: More tidying Tsukasa OI
2022-11-15  4:52 ` [PATCH 02/11] RISC-V: Add test for 'Zfinx' register switching Tsukasa OI
2022-11-15  4:52 ` [PATCH 03/11] RISC-V: Make mapping symbol checking consistent Tsukasa OI
2022-11-15  4:52 ` [PATCH 04/11] RISC-V: Split riscv_get_map_state into two steps Tsukasa OI
2022-11-15  4:52 ` [PATCH 05/11] RISC-V: One time CSR hash table initialization Tsukasa OI
2022-11-15  4:52 ` [PATCH 06/11] RISC-V: Use static xlen on ADDIW sequence Tsukasa OI
2022-11-15  4:52 ` [PATCH 07/11] opcodes/riscv-dis.c: Add form feed for separation Tsukasa OI
2022-11-15  4:52 ` [PATCH 08/11] RISC-V: Split match/print steps on disassembler Tsukasa OI
2022-11-15  4:52 ` [PATCH 09/11] RISC-V: Reorganize disassembler state initialization Tsukasa OI
2022-11-15  4:52 ` [PATCH 10/11] RISC-V: Reorganize arch-related initialization and management Tsukasa OI
2022-11-15  4:52 ` [PATCH 11/11] RISC-V: Move disassembler private data initialization Tsukasa OI
2022-11-28  4:43 ` [PATCH v2 00/11] RISC-V: Requirements for disassembler optimizations batch 1 Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 01/11] opcodes/riscv-dis.c: More tidying Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 02/11] RISC-V: Add test for 'Zfinx' register switching Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 03/11] RISC-V: Make mapping symbol checking consistent Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 04/11] RISC-V: Split riscv_get_map_state into two steps Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 05/11] RISC-V: One time CSR hash table initialization Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 06/11] RISC-V: Use static xlen on ADDIW sequence Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 07/11] opcodes/riscv-dis.c: Add form feed for separation Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 08/11] RISC-V: Split match/print steps on disassembler Tsukasa OI
2022-11-28  4:43   ` [PATCH v2 09/11] RISC-V: Reorganize disassembler state initialization Tsukasa OI
2022-11-28  4:43   ` Tsukasa OI [this message]
2022-11-28  4:43   ` [PATCH v2 11/11] RISC-V: Move disassembler private data initialization Tsukasa OI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e328130d90af42f30e3632fadd6c48e422b7b82.1669610611.git.research_trasio@irq.a4lg.com \
    --to=research_trasio@irq.a4lg.com \
    --cc=binutils@sourceware.org \
    --cc=kito.cheng@sifive.com \
    --cc=nelson@rivosinc.com \
    --cc=palmer@dabbelt.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).