From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=6TYg=3W=rivosinc.com=nelson@sourceware.org>
Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c])
	by sourceware.org (Postfix) with ESMTPS id 724F2385457E
	for <binutils@sourceware.org>; Tue, 22 Nov 2022 00:43:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 724F2385457E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com
Received: by mail-oi1-x22c.google.com with SMTP id m204so14307181oib.6
        for <binutils@sourceware.org>; Mon, 21 Nov 2022 16:43:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=rivosinc-com.20210112.gappssmtp.com; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=KOwEmOLQVZwtAgfAEu464fuDUDhUqTtJIbG8rUEMZPQ=;
        b=0R1TJasGOJfNBSdTuPWpnFl1kCjgKEGiuIvQHsiP8T1pxPGXHzoIOtMCUrC6ghhGjJ
         7x2VmgfqQctV1aiiHOUrhyC9Z4qfcsyMktvMEMasNpr6h/3H4fOiKmTscGwc3dKSzjRS
         fitPkPKdw2v66UlDVqhh/QJTojJL/qU/XnGab0JCFl1J2Hoqs5ZyLrGagkKQ8coyGbO8
         3htCx3Kc/tHYRnIG5Kc8RPM5+EWeAMNSHOeY3tPE8A3g/6E1r4sOykeG33/Sev0ZaEV1
         bP9o3Uwj4YJxBzgEcmMm7v4hrKEGxIHMAs5JGdWiFEOgNn7bAZzSGDc2ZubAtWAp7keW
         sr8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=KOwEmOLQVZwtAgfAEu464fuDUDhUqTtJIbG8rUEMZPQ=;
        b=z8ykU3Vs/G5bpKl/2XopHnMTU7Z9YOl7o7Ah9TJ4FDl6QB8Kup/5JaBvnBKLCMDX/f
         wFxyzY5SuOQyGf8+UEYIYWg2U7WqfvAN2v2Dnv+k3FEBxdPTykV+SWnZO/QjdvVfrD+Q
         ekzyAu+CMdN7XVyCHsGcytxzwOPish2OezKIIgI/Wns1s/8gl0pYQ3QWe+ffsFoIhVn8
         ayXka8FRH6RnRMRhitfDI54VtxCBsmVDs4KomTpPJ4vJwBDCI6lbbePMT4j+TAXkAT0i
         d3My1GtxDFo9GdreZVAt5WInaI16GCnUQ0xlDhm9kNg6YoeaPNg7LIhh3XNLeVy7ERsA
         WxRw==
X-Gm-Message-State: ANoB5pn1FtQDoGx5A5TfMa758j55eI0cvQI1Le7zC/BqTC7nkI6tg8jV
	xlZ1VUtbhrLjx5GXxnohH4dzfCjNC/GEyrL+8GEtzQ==
X-Google-Smtp-Source: AA0mqf4Lmb/HSxsFGFDdXMT+OqTRJ3a3VWcn/UtiUyJG8i4fSjVhktSGUDwYa6N8TEYl3vjJ860B8BOnJtZrGZZHAoc=
X-Received: by 2002:a54:4694:0:b0:35a:83c1:ce7 with SMTP id
 k20-20020a544694000000b0035a83c10ce7mr632839oic.107.1669077806581; Mon, 21
 Nov 2022 16:43:26 -0800 (PST)
MIME-Version: 1.0
References: <cover.1668841829.git.research_trasio@irq.a4lg.com>
In-Reply-To: <cover.1668841829.git.research_trasio@irq.a4lg.com>
From: Nelson Chu <nelson@rivosinc.com>
Date: Tue, 22 Nov 2022 08:43:15 +0800
Message-ID: <CAPpQWtD0yWw7B_3xAh8XpAiuoai48yr_G86Zzb_dSnCQKAz+Cw@mail.gmail.com>
Subject: Re: [PATCH 0/2] RISC-V: Better support for long instructions (64 < x
 <= 176 [bits])
To: Tsukasa OI <research_trasio@irq.a4lg.com>
Cc: Jan Beulich <jbeulich@suse.com>, Kito Cheng <kito.cheng@sifive.com>, 
	Palmer Dabbelt <palmer@dabbelt.com>, binutils@sourceware.org
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>

On Sat, Nov 19, 2022 at 3:11 PM Tsukasa OI <research_trasio@irq.a4lg.com> wrote:
>
> Hello,
>
> Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit
> instructions with .insn") was suddenly merged into master by Jan Beulich.
> Although this encoding is not considered frozen (see the section "Expanded
> Instruction-Length Encoding" in the ISA Manual), such attempt makes sense.
>
> However, it was insufficient in some ways.  I quickly found a stack-based
> buffer overflow and fixed that.  Besides that, I found following issues
> related to long (assembler: ".insn"-based, disassembler: unrecognized)
> instructions:
>
>
>
> [Assembler: False "value conflicts with instruction length" error]
>
>     .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 44 hexadecimal digits (22 bytes)
>
> GAS successfully compiles this (0x607f is one of the 2-byte prefixes of
> 22-byte instructions).  However, it fails on following examples.
>
>     .insn 22, 0xfedcba98765432100123456789ab607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 32 hexadecimal digits (16 bytes)
>
> with following error:
>
>     a.s: Assembler messages:
>     a.s:1: Error: value conflicts with instruction length `22,0xfedcba98765432100123456789ab607f'
>
> This is not good.  Yes, the value is 16 bytes but upper 6-bytes of a 22-byte
> instruction could be just zeroed out.  It should be valid just like:
>
>     .insn 22, 0x607f
>
> This is a testcase from insn.s and current GAS can handle this.  The only
> reason which the example above fails is, the 16-byte constant is handled as
> O_big (a big number).  The root cause of this is:
>
>     if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM)
>       return _("value conflicts with instruction length");
>
> where:
>
>     bytes                  : Expected instruction length
>     imm_expr->X_add_number : MINIMUM Number of entries (of generic_bignum)
>                              required to represent a big number.
>                              Non-zero if O_big.
>     CHARS_PER_LITTLENUM    : A big number is represented as an array of
>                              little numbers (current GAS: unsigned short).
>                              This is the number of chars of a little number.
>
> That means, if a big number can be represented with different bytes as
> expected, ".insn" fails with "value conflicts with instruction length".
>
> Replacing this with:
>
>     if (bytes < imm_expr->X_add_number * CHARS_PER_LITTLENUM) /* == -> <  */
>       return _("value conflicts with instruction length");
>
> would resolve this problem on the current GAS but this depends on how
> the big numbers are represented.  If big number changes its base type
> (such as 2^32), it will stop working.
>
> My solution (on PATCH 2/2) is much complex.  It computes exact bytes
> required to repsent a big number and fails only if number of represented
> bytes exceeds expected number of bytes.
>
>     unsigned int llen = 0;
>     for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1]; lval != 0; llen++)
>       lval >>= BITS_PER_CHAR;
>     unsigned int repr_bytes = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen;
>     if (bytes < repr_bytes)
>       return _("value conflicts with instruction length");
>
> Although it may not work on all possible bigint encoding configurations, my
> implementation is flexible enough that any realistic bigint configurations
> would be compatible.
>
> This is the result after this fix (and the disassembler fix):
>
>        0:   607f 89ab 4567 0123     .byte   0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   3210 7654 ba98 fedc
>       10:   0000 0000 0000
>
>
>
> [Assembler: DWARF line number information is missing with big numbers]
>
> We can compile this on the current GAS (note: line number is prefixed):
>
> 1:  .insn 22, 0x607f
> 2:  .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 44 hexadecimal digits (22 bytes)
>
> with "as -gdwarf-2 -o a.o -c a.s"
>   or "as -g -o a.o -c a.s".
>
> However, if we extract the debug information, the result will be rather
> strange.  Here's the result of "objdump -WL a.o":
>
>     a.o:     file format elf64-littleriscv
>
>     Contents of the .debug_line section:
>
>     CU: a.s:
>     File name                            Line number    Starting address    View    Stmt
>     a.s                                            1                   0               x
>     a.s                                            -                0x2c
>
> A 22-byte instruction on the address 0 (line 1) is fine.  But wait, where
> the another 22-byte instruction on the address 0x16 (line 2) go?  Before
> describing, we'll see what's going on in the line 1.
>
> 1.  s_riscv_insn (".insn" directive) is called
>     a.  riscv_ip (".insn" with known instruction encoding) is called
>         but fails with "unrecognized opcode".
>     b.  riscv_ip_hardcode (".insn" with raw values) is called
>         and succeeds.
>     c.  append_insn is called **when imm_expr.X_op is not O_big**)
>         -> Go to 2
> 2.  append_insn (output an instruction)
>     a.  dwarf2_emit_insn is called (DWARF debug information is appended)
>     b.  relocation / fixups
>         (if relaxed instructions are emitted, return early)
>     c.  add_fixed_insn (add the instruction to the end of the output)
>
> The reason why it fails on line 2 (O_big) is 1.c.  Because DWARF debug
> information is appended on 2.a., this part is not called because append_insn
> is not called when imm_expr.X_op is O_big.  Instead, it does completely
> separate handling on the riscv_ip_hardcode function (1.b.).
>
> My patchset (PATCH 2/2) unifies this separate handling with regular
> instructions.  That means, 1.c. is changed so that append_insn is called
> even if imm_expr.X_op is O_big.  To unify handling, it adds insn_long_opcode
> (long opcode encoding up to 22-bytes) to struct riscv_cl_insn and the
> install_insn function (called by add_fixed_insn function) is modified to
> handle long instructions.  riscv_ip_hardcode function is modified to store
> long instruction encodings to new insn_long_opcode field.
>
> As a result, ".insn" directive with a big number emits correct
> debug information like this:
>
>     a.o:     file format elf64-littleriscv
>
>     Contents of the .debug_line section:
>
>     CU: a.s:
>     File name                            Line number    Starting address    View    Stmt
>     a.s                                            1                   0               x
>     a.s                                            2                0x16       1       x
>     a.s                                            -                0x2c
>
>
>
> [Disassembler: Instruction is trimmed with 64-bits]
>
> In this section, we reuse the object file generated by the section above
> (two 22-byte instructions).
>
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  zeroed out after first 64-bits
>
> See ".byte" at the address 0x16.  It's trimmed at 64-bits.
> The resolution is simple.  If we simply add a char* argument (containing all
> instruction bits), we can easily resolve this.
>
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  all instruction bits are correct
>
>
>
> With this patchset, although we don't support any "valid" long instructions
> yet, we can support long "invalid" instructions (only emitted through
> ".insn" and the encoded bits do not match to any valid instructions and
> printed as ".byte").
>
> This patchset is also important because it makes the assembler behavior
> consistent (does not depend whether the raw value can be represented with
> a small constant).
>
>
>
> Thanks,
> Tsukasa
>
>
>
>
> Tsukasa OI (2):
>   RISC-V: Make .insn tests stricter
>   RISC-V: Better support for long instructions

Thanks for Jan's feedback and it makes sense.  I have no further
opinion here, so I will totally respect and follow Jan judgment for
this series.  Anyway, thanks for spending time to fix this.

Nelson


>  gas/config/tc-riscv.c                | 50 +++++++++++++++++++++++-----
>  gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++-
>  gas/testsuite/gas/riscv/insn-na.d    | 28 ++++++++++------
>  gas/testsuite/gas/riscv/insn.d       | 32 +++++++++++++++---
>  gas/testsuite/gas/riscv/insn.s       |  9 +++++
>  opcodes/riscv-dis.c                  | 13 +++++---
>  6 files changed, 113 insertions(+), 29 deletions(-)
>
>
> base-commit: 15253318be0995200cc59929ca32eedbfd041e45
> --
> 2.38.1
>