public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/1] RISC-V: Use faster hash table on disassembling
@ 2022-07-09  3:49 Tsukasa OI
  2022-07-09  3:49 ` [PATCH 1/1] " Tsukasa OI
  2022-07-30  4:22 ` [PATCH 0/1] " Tsukasa OI
  0 siblings, 2 replies; 3+ messages in thread
From: Tsukasa OI @ 2022-07-09  3:49 UTC (permalink / raw)
  To: Tsukasa OI, Nelson Chu, Kito Cheng, Palmer Dabbelt; +Cc: binutils

Hello,

This patchset intends to improve performance on disassembling RISC-V
code (which may possibly contain invalid data).  It replaces riscv_hash
(on opcodes/riscv-dis.c) with much faster data structure: sorted and
partitioned hash table.

Tracker on GitHub:
<https://github.com/a4lg/binutils-gdb/wiki/riscv_dis_opts_hashtable>

    Sidenote:
    I started listing my Binutils submissions on my GitHub Wiki:
    <https://github.com/a4lg/binutils-gdb/wiki/Patch-Queue>
    hoping that current status and conflicting patches are clear.

***WARNING***

This patchset conflicts with following patchset(s):
-   <https://sourceware.org/pipermail/binutils/2022-June/121441.html>
    (Tracker: <https://github.com/a4lg/binutils-gdb/wiki/riscv_dis_generics>)
If either of them is merged, I will submit rebased patchset.



This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and I simplified the algorithm even further.
Unlike SPARC, RISC-V hashed opcode table is not a table to linked lists,
it's just a table, pointing to "start" elements of the sorted opcode
list (sorted by hash code) plus global tail.

I benchmarked some of the programs and I measure somewhat between 2%
to 10% performance increase while disassembling code section of RISC-V
ELF files (objdump -d $FILE).  That is not significant but not bad for
such a small modification (with ~ 11KB heap memory allocation on 64-bit
environment).

This is not the end.  This structure significantly improves plain binary
file handling (on objdump, objdump -b binary -m riscv:rv[32|64] -D
$FILE).  I tested on a big vmlinux image with debug symbols and I got
over 50% performance boost.  This is due to the fact that, disassembling
about one quarter of invalid "instruction" words required iterating over
one thousand opcode entries (>= 348 being vector instructions with OP-V,
that can be easily skipped with this new data structure).

Thanks,
Tsukasa




Tsukasa OI (1):
  RISC-V: Use faster hash table on disassembling

 opcodes/riscv-dis.c | 214 ++++++++++++++++++++++++++++----------------
 1 file changed, 136 insertions(+), 78 deletions(-)


base-commit: d2acd4b0c5bab349aaa152d60268bc144634a844
-- 
2.34.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-30  4:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-09  3:49 [PATCH 0/1] RISC-V: Use faster hash table on disassembling Tsukasa OI
2022-07-09  3:49 ` [PATCH 1/1] " Tsukasa OI
2022-07-30  4:22 ` [PATCH 0/1] " Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).