From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id 724F2385457E for ; Tue, 22 Nov 2022 00:43:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 724F2385457E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-oi1-x22c.google.com with SMTP id m204so14307181oib.6 for ; Mon, 21 Nov 2022 16:43:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KOwEmOLQVZwtAgfAEu464fuDUDhUqTtJIbG8rUEMZPQ=; b=0R1TJasGOJfNBSdTuPWpnFl1kCjgKEGiuIvQHsiP8T1pxPGXHzoIOtMCUrC6ghhGjJ 7x2VmgfqQctV1aiiHOUrhyC9Z4qfcsyMktvMEMasNpr6h/3H4fOiKmTscGwc3dKSzjRS fitPkPKdw2v66UlDVqhh/QJTojJL/qU/XnGab0JCFl1J2Hoqs5ZyLrGagkKQ8coyGbO8 3htCx3Kc/tHYRnIG5Kc8RPM5+EWeAMNSHOeY3tPE8A3g/6E1r4sOykeG33/Sev0ZaEV1 bP9o3Uwj4YJxBzgEcmMm7v4hrKEGxIHMAs5JGdWiFEOgNn7bAZzSGDc2ZubAtWAp7keW sr8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KOwEmOLQVZwtAgfAEu464fuDUDhUqTtJIbG8rUEMZPQ=; b=z8ykU3Vs/G5bpKl/2XopHnMTU7Z9YOl7o7Ah9TJ4FDl6QB8Kup/5JaBvnBKLCMDX/f wFxyzY5SuOQyGf8+UEYIYWg2U7WqfvAN2v2Dnv+k3FEBxdPTykV+SWnZO/QjdvVfrD+Q ekzyAu+CMdN7XVyCHsGcytxzwOPish2OezKIIgI/Wns1s/8gl0pYQ3QWe+ffsFoIhVn8 ayXka8FRH6RnRMRhitfDI54VtxCBsmVDs4KomTpPJ4vJwBDCI6lbbePMT4j+TAXkAT0i d3My1GtxDFo9GdreZVAt5WInaI16GCnUQ0xlDhm9kNg6YoeaPNg7LIhh3XNLeVy7ERsA WxRw== X-Gm-Message-State: ANoB5pn1FtQDoGx5A5TfMa758j55eI0cvQI1Le7zC/BqTC7nkI6tg8jV xlZ1VUtbhrLjx5GXxnohH4dzfCjNC/GEyrL+8GEtzQ== X-Google-Smtp-Source: AA0mqf4Lmb/HSxsFGFDdXMT+OqTRJ3a3VWcn/UtiUyJG8i4fSjVhktSGUDwYa6N8TEYl3vjJ860B8BOnJtZrGZZHAoc= X-Received: by 2002:a54:4694:0:b0:35a:83c1:ce7 with SMTP id k20-20020a544694000000b0035a83c10ce7mr632839oic.107.1669077806581; Mon, 21 Nov 2022 16:43:26 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Nelson Chu Date: Tue, 22 Nov 2022 08:43:15 +0800 Message-ID: Subject: Re: [PATCH 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) To: Tsukasa OI Cc: Jan Beulich , Kito Cheng , Palmer Dabbelt , binutils@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, Nov 19, 2022 at 3:11 PM Tsukasa OI wrote: > > Hello, > > Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit > instructions with .insn") was suddenly merged into master by Jan Beulich. > Although this encoding is not considered frozen (see the section "Expanded > Instruction-Length Encoding" in the ISA Manual), such attempt makes sense. > > However, it was insufficient in some ways. I quickly found a stack-based > buffer overflow and fixed that. Besides that, I found following issues > related to long (assembler: ".insn"-based, disassembler: unrecognized) > instructions: > > > > [Assembler: False "value conflicts with instruction length" error] > > .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 44 hexadecimal digits (22 bytes) > > GAS successfully compiles this (0x607f is one of the 2-byte prefixes of > 22-byte instructions). However, it fails on following examples. > > .insn 22, 0xfedcba98765432100123456789ab607f > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 32 hexadecimal digits (16 bytes) > > with following error: > > a.s: Assembler messages: > a.s:1: Error: value conflicts with instruction length `22,0xfedcba98765432100123456789ab607f' > > This is not good. Yes, the value is 16 bytes but upper 6-bytes of a 22-byte > instruction could be just zeroed out. It should be valid just like: > > .insn 22, 0x607f > > This is a testcase from insn.s and current GAS can handle this. The only > reason which the example above fails is, the 16-byte constant is handled as > O_big (a big number). The root cause of this is: > > if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM) > return _("value conflicts with instruction length"); > > where: > > bytes : Expected instruction length > imm_expr->X_add_number : MINIMUM Number of entries (of generic_bignum) > required to represent a big number. > Non-zero if O_big. > CHARS_PER_LITTLENUM : A big number is represented as an array of > little numbers (current GAS: unsigned short). > This is the number of chars of a little number. > > That means, if a big number can be represented with different bytes as > expected, ".insn" fails with "value conflicts with instruction length". > > Replacing this with: > > if (bytes < imm_expr->X_add_number * CHARS_PER_LITTLENUM) /* == -> < */ > return _("value conflicts with instruction length"); > > would resolve this problem on the current GAS but this depends on how > the big numbers are represented. If big number changes its base type > (such as 2^32), it will stop working. > > My solution (on PATCH 2/2) is much complex. It computes exact bytes > required to repsent a big number and fails only if number of represented > bytes exceeds expected number of bytes. > > unsigned int llen = 0; > for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1]; lval != 0; llen++) > lval >>= BITS_PER_CHAR; > unsigned int repr_bytes = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen; > if (bytes < repr_bytes) > return _("value conflicts with instruction length"); > > Although it may not work on all possible bigint encoding configurations, my > implementation is flexible enough that any realistic bigint configurations > would be compatible. > > This is the result after this fix (and the disassembler fix): > > 0: 607f 89ab 4567 0123 .byte 0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 > 8: 3210 7654 ba98 fedc > 10: 0000 0000 0000 > > > > [Assembler: DWARF line number information is missing with big numbers] > > We can compile this on the current GAS (note: line number is prefixed): > > 1: .insn 22, 0x607f > 2: .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 44 hexadecimal digits (22 bytes) > > with "as -gdwarf-2 -o a.o -c a.s" > or "as -g -o a.o -c a.s". > > However, if we extract the debug information, the result will be rather > strange. Here's the result of "objdump -WL a.o": > > a.o: file format elf64-littleriscv > > Contents of the .debug_line section: > > CU: a.s: > File name Line number Starting address View Stmt > a.s 1 0 x > a.s - 0x2c > > A 22-byte instruction on the address 0 (line 1) is fine. But wait, where > the another 22-byte instruction on the address 0x16 (line 2) go? Before > describing, we'll see what's going on in the line 1. > > 1. s_riscv_insn (".insn" directive) is called > a. riscv_ip (".insn" with known instruction encoding) is called > but fails with "unrecognized opcode". > b. riscv_ip_hardcode (".insn" with raw values) is called > and succeeds. > c. append_insn is called **when imm_expr.X_op is not O_big**) > -> Go to 2 > 2. append_insn (output an instruction) > a. dwarf2_emit_insn is called (DWARF debug information is appended) > b. relocation / fixups > (if relaxed instructions are emitted, return early) > c. add_fixed_insn (add the instruction to the end of the output) > > The reason why it fails on line 2 (O_big) is 1.c. Because DWARF debug > information is appended on 2.a., this part is not called because append_insn > is not called when imm_expr.X_op is O_big. Instead, it does completely > separate handling on the riscv_ip_hardcode function (1.b.). > > My patchset (PATCH 2/2) unifies this separate handling with regular > instructions. That means, 1.c. is changed so that append_insn is called > even if imm_expr.X_op is O_big. To unify handling, it adds insn_long_opcode > (long opcode encoding up to 22-bytes) to struct riscv_cl_insn and the > install_insn function (called by add_fixed_insn function) is modified to > handle long instructions. riscv_ip_hardcode function is modified to store > long instruction encodings to new insn_long_opcode field. > > As a result, ".insn" directive with a big number emits correct > debug information like this: > > a.o: file format elf64-littleriscv > > Contents of the .debug_line section: > > CU: a.s: > File name Line number Starting address View Stmt > a.s 1 0 x > a.s 2 0x16 1 x > a.s - 0x2c > > > > [Disassembler: Instruction is trimmed with 64-bits] > > In this section, we reuse the object file generated by the section above > (two 22-byte instructions). > > 0000000000000000 <.text>: > 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 > 8: 0000 0000 0000 0000 > 10: 0000 0000 0000 > 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 > 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 26: 7654 ba98 fedc zeroed out after first 64-bits > > See ".byte" at the address 0x16. It's trimmed at 64-bits. > The resolution is simple. If we simply add a char* argument (containing all > instruction bits), we can easily resolve this. > > 0000000000000000 <.text>: > 0: 607f 0000 0000 0000 .byte 0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 > 8: 0000 0000 0000 0000 > 10: 0000 0000 0000 > 16: 607f 33cc 55aa cdef .byte 0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe > 1e: 89ab 4567 0123 3210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 26: 7654 ba98 fedc all instruction bits are correct > > > > With this patchset, although we don't support any "valid" long instructions > yet, we can support long "invalid" instructions (only emitted through > ".insn" and the encoded bits do not match to any valid instructions and > printed as ".byte"). > > This patchset is also important because it makes the assembler behavior > consistent (does not depend whether the raw value can be represented with > a small constant). > > > > Thanks, > Tsukasa > > > > > Tsukasa OI (2): > RISC-V: Make .insn tests stricter > RISC-V: Better support for long instructions Thanks for Jan's feedback and it makes sense. I have no further opinion here, so I will totally respect and follow Jan judgment for this series. Anyway, thanks for spending time to fix this. Nelson > gas/config/tc-riscv.c | 50 +++++++++++++++++++++++----- > gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++- > gas/testsuite/gas/riscv/insn-na.d | 28 ++++++++++------ > gas/testsuite/gas/riscv/insn.d | 32 +++++++++++++++--- > gas/testsuite/gas/riscv/insn.s | 9 +++++ > opcodes/riscv-dis.c | 13 +++++--- > 6 files changed, 113 insertions(+), 29 deletions(-) > > > base-commit: 15253318be0995200cc59929ca32eedbfd041e45 > -- > 2.38.1 >