From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154]) by sourceware.org (Postfix) with ESMTPS id DCB093858D32 for ; Sat, 30 Jul 2022 04:20:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DCB093858D32 Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id 3EBE7300089; Sat, 30 Jul 2022 04:20:40 +0000 (UTC) Message-ID: Date: Sat, 30 Jul 2022 13:20:38 +0900 Mime-Version: 1.0 Subject: Re: [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Content-Language: en-US To: Kito Cheng , Palmer Dabbelt , Nelson Chu Cc: binutils@sourceware.org References: From: Tsukasa OI In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jul 2022 04:20:43 -0000 This patch is postponed because I will merge this into a larger patchset with general core disassembler improvements. Thanks, Tsukasa On 2022/06/03 21:05, Tsukasa OI wrote: > Hello, > > I noticed another disassembler issue and this is a fix. > > > As I submitted in my previous patchset: > , > there is a problem while disassembling zext.h instruction on > RV{32,64}_Zbb_Zbkb configuration. > > On RV32, zext.h on Zbb is a specialized form of pack (Zbkb). > On RV64, zext.h on Zbb is a specialized form of packw (Zbkb). > > Note that, if Zbkb extension is disabled, zext.h is a single > instruction with no "generalized" forms. > > > When **both** Zbb and Zbkb extensions are enabled and a zext.h > instruction is disassembled with `-M no-aliases', it should print > a non-alias instruction (that is, either pack or packw). > > # Input source file > _start: > zext.h a0, a1 # Specialized form on Zbb > > # Expected (RV32_Zbb_Zbkb, objdump -d -M no-aliases) > 80000028 <_start>: > 80000028: 0805c533 pack a0,a1,zero > > However, it prints an alias. > > # Actual (RV32_Zbb_Zbkb, objdump -d -M no-aliases) > 80000028 <_start>: > 80000028: 0805c533 zext.h a0,a1 > > Note that, depending on -march, an "alias" is no longer an alias > (as I noted earlier). > > # Expected/Actual (RV32_Zbb, objdump -d -M no-aliases) > 80000028 <_start>: > 80000028: 0805c533 zext.h a0,a1 > > > In general, this kind of issue occurs when: > > 1. There are two instructions > (one of them is a specialized form of another). > 2. Those requirements are different (separate) but can co-exist. > > Because of 2., both instructions cannot be simple aliases (INSN_ALIAS > cannot be used). However on non-alias instructions, if a match is > found, riscv_disassemble_insn thinks this is it and quits too early. > > Because zext.h is declared before pack and packw, generalized forms are > not matched for zext.h. > > > > > As a solution, I propose a new concept: generic subsets. > > For instructions with INSN_GENERICS, opcode matching cannot early-quit. > Instead it searches an instruction with: > > - Longest mask (by default; when -M no-aliases is NOT specified) > - Shortest mask (when -M no-aliases is specified) > > "Length" of the mask equals population count. More one bits on the mask > means more specialized instruction form. > > It fixes disassembler on following instructions and configurations: > > - zext.h (Zbb) <--> pack (Zbkb; RV32) > - zext.h (Zbb) <--> packw (Zbkb; RV64) > > Note that INSN_GENERICS (new flag in PATCH 2) must be set **both** on > specialized and generic forms. In the example above, those instructions > require INSN_GENERICS: > > - zext.h (two XLEN-specific forms) > - pack > - packw > > This concept can be used to following instruction pairs where the same > issue can occur once non-ratified instruction is ratified. > > - orc.b (Zbb) <--> gorci (NOT RATIFIED) > - brev8 (Zbkb) <--> grevi (NOT RATIFIED) > - zip (Zbkb) <--> shfli (NOT RATIFIED) > - unzip (Zbkb) <--> unshfli (NOT RATIFIED) > - rev8 (Zbb/Zbkb) <--> grevi (NOT RATIFIED) > > > > > PATCH 1: Reorganize riscv_disassemble_insn function > > riscv_disassemble_insn function (before PATCH 1) works like this: > > FOREACH (opcode) > { > if (!opcode.MATCH(insn)) > continue; > // > // Print insn with matched opcode. > // > return insnlen; > } > // > // opcode not matched. > // > > Because instruction printing code is in the opcode loop, this is not > good to fix the problem. This patch reorganizes the function like this: > > matched_opcode = NULL; > // Step 1: opcode matching > FOREACH (opcode) > { > if (!opcode.MATCH(insn)) > continue; > matched_opcode = opcode; > break; > } > // Step 2: opcode printing > if (matched_opcode != NULL) > { > // > // Print insn with matched_opcode. > // > return insnlen; > } > // > // opcode not matched. > // > > PATCH 1 splits opcode loop to two separate steps: opcode matching and > printing. With this, we can modify opcode matching step to implement > "generic subsets". > > > > > PATCH 2: Implement "generic subsets" > > With this commit, riscv_disassemble_insn function works like this: > > int masklen; > matched_opcode = NULL; > // Step 1: opcode matching > FOREACH (opcode) > { > if (!opcode.MATCH(insn)) > continue; > if (!(opcode.pinfo & INSN_GENERICS)) > { > matched_opcode = opcode; > break; // Early-quit as before (for non generic subsets) > } > // Handling of generic subsets > if (matched_opcode == NULL) > { > matched_opcode = opcode; > // Set corresponding masklen. > } > else if (no_aliases) > { > // Prefer opcode with shortest mask. > // Update match_opcode and masklen where necessary. > } > else > { > // Prefer opcode with longest mask. > // Update match_opcode and masklen where necessary. > } > } > // Step 2: opcode printing (continues...) > > PATCH 2 also adds INSN_GENERICS flag to two zext.h forms and pack/packw > instructions. > > > > > PATCH 3 contains additional testcases. > > > > > Thanks, > Tsukasa > > > > > Tsukasa OI (3): > RISC-V: Split disasm opcode matching and printing > RISC-V: Implement "generic subsets" for disasm > RISC-V: Add testcases disassembling zext.h > > gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++ > gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++ > gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++ > gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s | 2 + > gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d | 11 +++ > gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s | 4 + > include/opcode/riscv.h | 4 + > opcodes/riscv-dis.c | 93 ++++++++++++-------- > opcodes/riscv-opc.c | 8 +- > 9 files changed, 114 insertions(+), 41 deletions(-) > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d > create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s > > > base-commit: 625b6eae091709b95471eae92d42dc6bc71e6553