From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154]) by sourceware.org (Postfix) with ESMTPS id 092803858430 for ; Sat, 9 Jul 2022 03:49:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 092803858430 Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id B081E300089; Sat, 9 Jul 2022 03:49:25 +0000 (UTC) From: Tsukasa OI To: Tsukasa OI , Nelson Chu , Kito Cheng , Palmer Dabbelt Cc: binutils@sourceware.org Subject: [PATCH 0/1] RISC-V: Use faster hash table on disassembling Date: Sat, 9 Jul 2022 12:49:10 +0900 Message-Id: Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2022 03:49:29 -0000 Hello, This patchset intends to improve performance on disassembling RISC-V code (which may possibly contain invalid data). It replaces riscv_hash (on opcodes/riscv-dis.c) with much faster data structure: sorted and partitioned hash table. Tracker on GitHub: Sidenote: I started listing my Binutils submissions on my GitHub Wiki: hoping that current status and conflicting patches are clear. ***WARNING*** This patchset conflicts with following patchset(s): - (Tracker: ) If either of them is merged, I will submit rebased patchset. This is a technique actually used on SPARC architecture (opcodes/sparc-dis.c) and I simplified the algorithm even further. Unlike SPARC, RISC-V hashed opcode table is not a table to linked lists, it's just a table, pointing to "start" elements of the sorted opcode list (sorted by hash code) plus global tail. I benchmarked some of the programs and I measure somewhat between 2% to 10% performance increase while disassembling code section of RISC-V ELF files (objdump -d $FILE). That is not significant but not bad for such a small modification (with ~ 11KB heap memory allocation on 64-bit environment). This is not the end. This structure significantly improves plain binary file handling (on objdump, objdump -b binary -m riscv:rv[32|64] -D $FILE). I tested on a big vmlinux image with debug symbols and I got over 50% performance boost. This is due to the fact that, disassembling about one quarter of invalid "instruction" words required iterating over one thousand opcode entries (>= 348 being vector instructions with OP-V, that can be easily skipped with this new data structure). Thanks, Tsukasa Tsukasa OI (1): RISC-V: Use faster hash table on disassembling opcodes/riscv-dis.c | 214 ++++++++++++++++++++++++++++---------------- 1 file changed, 136 insertions(+), 78 deletions(-) base-commit: d2acd4b0c5bab349aaa152d60268bc144634a844 -- 2.34.1