From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <research_trasio@irq.a4lg.com>
Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154])
 by sourceware.org (Postfix) with ESMTPS id 092803858430
 for <binutils@sourceware.org>; Sat,  9 Jul 2022 03:49:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 092803858430
Received: from [127.0.0.1] (localhost [127.0.0.1])
 by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id B081E300089;
 Sat,  9 Jul 2022 03:49:25 +0000 (UTC)
From: Tsukasa OI <research_trasio@irq.a4lg.com>
To: Tsukasa OI <research_trasio@irq.a4lg.com>,
 Nelson Chu <nelson.chu@sifive.com>, Kito Cheng <kito.cheng@sifive.com>,
 Palmer Dabbelt <palmer@dabbelt.com>
Cc: binutils@sourceware.org
Subject: [PATCH 0/1] RISC-V: Use faster hash table on disassembling
Date: Sat,  9 Jul 2022 12:49:10 +0900
Message-Id: <cover.1657338493.git.research_trasio@irq.a4lg.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: binutils@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Binutils mailing list <binutils.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/binutils>,
 <mailto:binutils-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/binutils>,
 <mailto:binutils-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2022 03:49:29 -0000

Hello,

This patchset intends to improve performance on disassembling RISC-V
code (which may possibly contain invalid data).  It replaces riscv_hash
(on opcodes/riscv-dis.c) with much faster data structure: sorted and
partitioned hash table.

Tracker on GitHub:
<https://github.com/a4lg/binutils-gdb/wiki/riscv_dis_opts_hashtable>

    Sidenote:
    I started listing my Binutils submissions on my GitHub Wiki:
    <https://github.com/a4lg/binutils-gdb/wiki/Patch-Queue>
    hoping that current status and conflicting patches are clear.

***WARNING***

This patchset conflicts with following patchset(s):
-   <https://sourceware.org/pipermail/binutils/2022-June/121441.html>
    (Tracker: <https://github.com/a4lg/binutils-gdb/wiki/riscv_dis_generics>)
If either of them is merged, I will submit rebased patchset.


This is a technique actually used on SPARC architecture
(opcodes/sparc-dis.c) and I simplified the algorithm even further.
Unlike SPARC, RISC-V hashed opcode table is not a table to linked lists,
it's just a table, pointing to "start" elements of the sorted opcode
list (sorted by hash code) plus global tail.

I benchmarked some of the programs and I measure somewhat between 2%
to 10% performance increase while disassembling code section of RISC-V
ELF files (objdump -d $FILE).  That is not significant but not bad for
such a small modification (with ~ 11KB heap memory allocation on 64-bit
environment).

This is not the end.  This structure significantly improves plain binary
file handling (on objdump, objdump -b binary -m riscv:rv[32|64] -D
$FILE).  I tested on a big vmlinux image with debug symbols and I got
over 50% performance boost.  This is due to the fact that, disassembling
about one quarter of invalid "instruction" words required iterating over
one thousand opcode entries (>= 348 being vector instructions with OP-V,
that can be easily skipped with this new data structure).

Thanks,
Tsukasa


Tsukasa OI (1):
  RISC-V: Use faster hash table on disassembling

 opcodes/riscv-dis.c | 214 ++++++++++++++++++++++++++++----------------
 1 file changed, 136 insertions(+), 78 deletions(-)


base-commit: d2acd4b0c5bab349aaa152d60268bc144634a844
-- 
2.34.1