From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154]) by sourceware.org (Postfix) with ESMTPS id 9A3773858CDA for ; Sat, 19 Nov 2022 07:11:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9A3773858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=irq.a4lg.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irq.a4lg.com Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id 0DE28300089; Sat, 19 Nov 2022 07:11:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=irq.a4lg.com; s=2017s01; t=1668841883; bh=t17Vre+xztoO8Gd9ifwVqgVsGUQ0KimVzg8RcJCjgQU=; h=From:To:Cc:Subject:Date:Message-Id:Mime-Version: Content-Transfer-Encoding; b=rrw/dcfPozAMba6b6oRGKmBOzb6KwlfILfYIftMfqoa99HpFrVKyi5pwZS7LZVGZA h46YcgbWK591OgTIxUIw1hujbzwT+BTk/eMvK56hgzdUGtMkXX7VVi+WkfOdObHoW2 Gr5P67mb0h+vsB2KjZGIl9dFGgv9BMqP0ab+fcaE= From: Tsukasa OI To: Tsukasa OI , Jan Beulich , Nelson Chu , Kito Cheng , Palmer Dabbelt Cc: binutils@sourceware.org Subject: [PATCH 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Date: Sat, 19 Nov 2022 07:10:32 +0000 Message-Id: Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit instructions with .insn") was suddenly merged into master by Jan Beulich. Although this encoding is not considered frozen (see the section "Expanded Instruction-Length Encoding" in the ISA Manual), such attempt makes sense. However, it was insufficient in some ways. I quickly found a stack-based buffer overflow and fixed that. Besides that, I found following issues related to long (assembler: ".insn"-based, disassembler: unrecognized) instructions: [Assembler: False "value conflicts with instruction length" error] .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 44 hexadecimal digits (22 bytes) GAS successfully compiles this (0x607f is one of the 2-byte prefixes of 22-byte instructions). However, it fails on following examples. .insn 22, 0xfedcba98765432100123456789ab607f ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 32 hexadecimal digits (16 bytes) with following error: a.s: Assembler messages: a.s:1: Error: value conflicts with instruction length `22,0xfedcba98765432100123456789ab607f' This is not good. Yes, the value is 16 bytes but upper 6-bytes of a 22-byte instruction could be just zeroed out. It should be valid just like: .insn 22, 0x607f This is a testcase from insn.s and current GAS can handle this. The only reason which the example above fails is, the 16-byte constant is handled as O_big (a big number). The root cause of this is: if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM) return _("value conflicts with instruction length"); where: bytes : Expected instruction length imm_expr->X_add_number : MINIMUM Number of entries (of generic_bignum) required to represent a big number. Non-zero if O_big. CHARS_PER_LITTLENUM : A big number is represented as an array of little numbers (current GAS: unsigned short). This is the number of chars of a little number. That means, if a big number can be represented with different bytes as expected, ".insn" fails with "value conflicts with instruction length". Replacing this with: if (bytes < imm_expr->X_add_number * CHARS_PER_LITTLENUM) /* == -> < */ return _("value conflicts with instruction length"); would resolve this problem on the current GAS but this depends on how the big numbers are represented. If big number changes its base type (such as 2^32), it will stop working. My solution (on PATCH 2/2) is much complex. It computes exact bytes required to repsent a big number and fails only if number of represented bytes exceeds expected number of bytes. unsigned int llen = 0; for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1]; lval != 0; llen++) lval >>= BITS_PER_CHAR; unsigned int repr_bytes = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen; if (bytes < repr_bytes) return _("value conflicts with instruction length"); Although it may not work on all possible bigint encoding configurations, my implementation is flexible enough that any realistic bigint configurations would be compatible. This is the result after this fix (and the disassembler fix): 0: 607f 89ab 4567 0123 .byte 0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 8: 3210 7654 ba98 fedc 10: 0000 0000 0000 [Assembler: DWARF line number information is missing with big numbers] We can compile this on the current GAS (note: line number is prefixed): 1: .insn 22, 0x607f 2: .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 44 hexadecimal digits (22 bytes) with "as -gdwarf-2 -o a.o -c a.s" or "as -g -o a.o -c a.s". However, if we extract the debug information, the result will be rather strange. Here's the result of "objdump -WL a.o": a.o: file format elf64-littleriscv Contents of the .debug_line section: CU: a.s: File name Line number Starting address View Stmt a.s 1 0 x a.s - 0x2c A 22-byte instruction on the address 0 (line 1) is fine. But wait, where the another 22-byte instruction on the address 0x16 (line 2) go? Before describing, we'll see what's going on in the line 1. 1. s_riscv_insn (".insn" directive) is called a. riscv_ip (".insn" with known instruction encoding) is called but fails with "unrecognized opcode". b. riscv_ip_hardcode (".insn" with raw values) is called and succeeds. c. append_insn is called **when imm_expr.X_op is not O_big**) -> Go to 2 2. append_insn (output an instruction) a. dwarf2_emit_insn is called (DWARF debug information is appended) b. relocation / fixups (if relaxed instructions are emitted, return early) c. add_fixed_insn (add the instruction to the end of the output) The reason why it fails on line 2 (O_big) is 1.c. Because DWARF debug information is appended on 2.a., this part is not called because append_insn is not called when imm_expr.X_op is O_big. Instead, it does completely separate handling on the riscv_ip_hardcode function (1.b.). My patchset (PATCH 2/2) unifies this separate handling with regular instructions. That means, 1.c. is changed so that append_insn is called even if imm_expr.X_op is O_big. To unify handling, it adds insn_long_opcode (long opcode encoding up to 22-bytes) to struct riscv_cl_insn and the install_insn function (called by add_fixed_insn function) is modified to handle long instructions. riscv_ip_hardcode function is modified to store long instruction encodings to new insn_long_opcode field. As a result, ".insn" directive with a big number emits correct debug information like this: a.o: file format elf64-littleriscv Contents of the .debug_line section: CU: a.s: File name Line number Starting address View Stmt a.s 1 0 x a.s 2 0x16 1 x a.s - 0x2c [Disassembler: Instruction is trimmed with 64-bits] In this section, we reuse the object file generated by the section above (two 22-byte instructions). 0000000000000000 <.text>: 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 8: 0000 0000 0000 0000 10: 0000 0000 0000 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 26: 7654 ba98 fedc zeroed out after first 64-bits See ".byte" at the address 0x16. It's trimmed at 64-bits. The resolution is simple. If we simply add a char* argument (containing all instruction bits), we can easily resolve this. 0000000000000000 <.text>: 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 8: 0000 0000 0000 0000 10: 0000 0000 0000 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 26: 7654 ba98 fedc all instruction bits are correct With this patchset, although we don't support any "valid" long instructions yet, we can support long "invalid" instructions (only emitted through ".insn" and the encoded bits do not match to any valid instructions and printed as ".byte"). This patchset is also important because it makes the assembler behavior consistent (does not depend whether the raw value can be represented with a small constant). Thanks, Tsukasa Tsukasa OI (2): RISC-V: Make .insn tests stricter RISC-V: Better support for long instructions gas/config/tc-riscv.c | 50 +++++++++++++++++++++++----- gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++- gas/testsuite/gas/riscv/insn-na.d | 28 ++++++++++------ gas/testsuite/gas/riscv/insn.d | 32 +++++++++++++++--- gas/testsuite/gas/riscv/insn.s | 9 +++++ opcodes/riscv-dis.c | 13 +++++--- 6 files changed, 113 insertions(+), 29 deletions(-) base-commit: 15253318be0995200cc59929ca32eedbfd041e45 -- 2.38.1