Re: [PATCH 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits])

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Nelson Chu <nelson@rivosinc.com>
To: Tsukasa OI <research_trasio@irq.a4lg.com>
Cc: Jan Beulich <jbeulich@suse.com>,
	Kito Cheng <kito.cheng@sifive.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	binutils@sourceware.org
Subject: Re: [PATCH 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits])
Date: Tue, 22 Nov 2022 08:43:15 +0800	[thread overview]
Message-ID: <CAPpQWtD0yWw7B_3xAh8XpAiuoai48yr_G86Zzb_dSnCQKAz+Cw@mail.gmail.com> (raw)
In-Reply-To: <cover.1668841829.git.research_trasio@irq.a4lg.com>

On Sat, Nov 19, 2022 at 3:11 PM Tsukasa OI <research_trasio@irq.a4lg.com> wrote:
>
> Hello,
>
> Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit
> instructions with .insn") was suddenly merged into master by Jan Beulich.
> Although this encoding is not considered frozen (see the section "Expanded
> Instruction-Length Encoding" in the ISA Manual), such attempt makes sense.
>
> However, it was insufficient in some ways.  I quickly found a stack-based
> buffer overflow and fixed that.  Besides that, I found following issues
> related to long (assembler: ".insn"-based, disassembler: unrecognized)
> instructions:
>
>
>
> [Assembler: False "value conflicts with instruction length" error]
>
>     .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 44 hexadecimal digits (22 bytes)
>
> GAS successfully compiles this (0x607f is one of the 2-byte prefixes of
> 22-byte instructions).  However, it fails on following examples.
>
>     .insn 22, 0xfedcba98765432100123456789ab607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 32 hexadecimal digits (16 bytes)
>
> with following error:
>
>     a.s: Assembler messages:
>     a.s:1: Error: value conflicts with instruction length `22,0xfedcba98765432100123456789ab607f'
>
> This is not good.  Yes, the value is 16 bytes but upper 6-bytes of a 22-byte
> instruction could be just zeroed out.  It should be valid just like:
>
>     .insn 22, 0x607f
>
> This is a testcase from insn.s and current GAS can handle this.  The only
> reason which the example above fails is, the 16-byte constant is handled as
> O_big (a big number).  The root cause of this is:
>
>     if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM)
>       return _("value conflicts with instruction length");
>
> where:
>
>     bytes                  : Expected instruction length
>     imm_expr->X_add_number : MINIMUM Number of entries (of generic_bignum)
>                              required to represent a big number.
>                              Non-zero if O_big.
>     CHARS_PER_LITTLENUM    : A big number is represented as an array of
>                              little numbers (current GAS: unsigned short).
>                              This is the number of chars of a little number.
>
> That means, if a big number can be represented with different bytes as
> expected, ".insn" fails with "value conflicts with instruction length".
>
> Replacing this with:
>
>     if (bytes < imm_expr->X_add_number * CHARS_PER_LITTLENUM) /* == -> <  */
>       return _("value conflicts with instruction length");
>
> would resolve this problem on the current GAS but this depends on how
> the big numbers are represented.  If big number changes its base type
> (such as 2^32), it will stop working.
>
> My solution (on PATCH 2/2) is much complex.  It computes exact bytes
> required to repsent a big number and fails only if number of represented
> bytes exceeds expected number of bytes.
>
>     unsigned int llen = 0;
>     for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1]; lval != 0; llen++)
>       lval >>= BITS_PER_CHAR;
>     unsigned int repr_bytes = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen;
>     if (bytes < repr_bytes)
>       return _("value conflicts with instruction length");
>
> Although it may not work on all possible bigint encoding configurations, my
> implementation is flexible enough that any realistic bigint configurations
> would be compatible.
>
> This is the result after this fix (and the disassembler fix):
>
>        0:   607f 89ab 4567 0123     .byte   0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   3210 7654 ba98 fedc
>       10:   0000 0000 0000
>
>
>
> [Assembler: DWARF line number information is missing with big numbers]
>
> We can compile this on the current GAS (note: line number is prefixed):
>
> 1:  .insn 22, 0x607f
> 2:  .insn 22, 0xfedcba98765432100123456789abcdef55aa33cc607f
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                 44 hexadecimal digits (22 bytes)
>
> with "as -gdwarf-2 -o a.o -c a.s"
>   or "as -g -o a.o -c a.s".
>
> However, if we extract the debug information, the result will be rather
> strange.  Here's the result of "objdump -WL a.o":
>
>     a.o:     file format elf64-littleriscv
>
>     Contents of the .debug_line section:
>
>     CU: a.s:
>     File name                            Line number    Starting address    View    Stmt
>     a.s                                            1                   0               x
>     a.s                                            -                0x2c
>
> A 22-byte instruction on the address 0 (line 1) is fine.  But wait, where
> the another 22-byte instruction on the address 0x16 (line 2) go?  Before
> describing, we'll see what's going on in the line 1.
>
> 1.  s_riscv_insn (".insn" directive) is called
>     a.  riscv_ip (".insn" with known instruction encoding) is called
>         but fails with "unrecognized opcode".
>     b.  riscv_ip_hardcode (".insn" with raw values) is called
>         and succeeds.
>     c.  append_insn is called **when imm_expr.X_op is not O_big**)
>         -> Go to 2
> 2.  append_insn (output an instruction)
>     a.  dwarf2_emit_insn is called (DWARF debug information is appended)
>     b.  relocation / fixups
>         (if relaxed instructions are emitted, return early)
>     c.  add_fixed_insn (add the instruction to the end of the output)
>
> The reason why it fails on line 2 (O_big) is 1.c.  Because DWARF debug
> information is appended on 2.a., this part is not called because append_insn
> is not called when imm_expr.X_op is O_big.  Instead, it does completely
> separate handling on the riscv_ip_hardcode function (1.b.).
>
> My patchset (PATCH 2/2) unifies this separate handling with regular
> instructions.  That means, 1.c. is changed so that append_insn is called
> even if imm_expr.X_op is O_big.  To unify handling, it adds insn_long_opcode
> (long opcode encoding up to 22-bytes) to struct riscv_cl_insn and the
> install_insn function (called by add_fixed_insn function) is modified to
> handle long instructions.  riscv_ip_hardcode function is modified to store
> long instruction encodings to new insn_long_opcode field.
>
> As a result, ".insn" directive with a big number emits correct
> debug information like this:
>
>     a.o:     file format elf64-littleriscv
>
>     Contents of the .debug_line section:
>
>     CU: a.s:
>     File name                            Line number    Starting address    View    Stmt
>     a.s                                            1                   0               x
>     a.s                                            2                0x16       1       x
>     a.s                                            -                0x2c
>
>
>
> [Disassembler: Instruction is trimmed with 64-bits]
>
> In this section, we reuse the object file generated by the section above
> (two 22-byte instructions).
>
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  zeroed out after first 64-bits
>
> See ".byte" at the address 0x16.  It's trimmed at 64-bits.
> The resolution is simple.  If we simply add a char* argument (containing all
> instruction bits), we can easily resolve this.
>
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  all instruction bits are correct
>
>
>
> With this patchset, although we don't support any "valid" long instructions
> yet, we can support long "invalid" instructions (only emitted through
> ".insn" and the encoded bits do not match to any valid instructions and
> printed as ".byte").
>
> This patchset is also important because it makes the assembler behavior
> consistent (does not depend whether the raw value can be represented with
> a small constant).
>
>
>
> Thanks,
> Tsukasa
>
>
>
>
> Tsukasa OI (2):
>   RISC-V: Make .insn tests stricter
>   RISC-V: Better support for long instructions

Thanks for Jan's feedback and it makes sense.  I have no further
opinion here, so I will totally respect and follow Jan judgment for
this series.  Anyway, thanks for spending time to fix this.

Nelson


>  gas/config/tc-riscv.c                | 50 +++++++++++++++++++++++-----
>  gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++-
>  gas/testsuite/gas/riscv/insn-na.d    | 28 ++++++++++------
>  gas/testsuite/gas/riscv/insn.d       | 32 +++++++++++++++---
>  gas/testsuite/gas/riscv/insn.s       |  9 +++++
>  opcodes/riscv-dis.c                  | 13 +++++---
>  6 files changed, 113 insertions(+), 29 deletions(-)
>
>
> base-commit: 15253318be0995200cc59929ca32eedbfd041e45
> --
> 2.38.1
>

next prev parent reply	other threads:[~2022-11-22  0:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19  7:10 Tsukasa OI
2022-11-19  7:10 ` [PATCH 1/2] RISC-V: Make .insn tests stricter Tsukasa OI
2022-11-21  7:32   ` Jan Beulich
2022-11-23  8:20     ` Tsukasa OI
2022-11-23  8:56       ` Jan Beulich
2022-11-19  7:10 ` [PATCH 2/2] RISC-V: Better support for long instructions Tsukasa OI
2022-11-21  7:37   ` Jan Beulich
2022-11-23  8:40     ` Tsukasa OI
2022-11-23  8:44       ` Jan Beulich
2022-11-23  8:51         ` Tsukasa OI
2022-11-25  1:38       ` Nelson Chu
2022-11-25  2:33         ` Tsukasa OI
2022-11-22  0:43 ` Nelson Chu [this message]
2022-11-23  8:30 ` [PATCH v2 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Tsukasa OI
2022-11-23  8:30   ` [PATCH v2 1/2] RISC-V: Make .insn tests stricter Tsukasa OI
2022-11-23  8:30   ` [PATCH v2 2/2] RISC-V: Better support for long instructions Tsukasa OI
2022-11-23  9:04     ` Jan Beulich
2022-11-24  2:34       ` Tsukasa OI
2022-11-24  7:31         ` Jan Beulich
2022-11-24  7:35           ` Tsukasa OI
2022-11-25  2:17   ` [PATCH v3 0/2] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Tsukasa OI
2022-11-25  2:17     ` [PATCH v3 1/2] RISC-V: Better support for long instructions (disassembler) Tsukasa OI
2022-11-25  8:03       ` Jan Beulich
2022-11-25  2:17     ` [PATCH v3 2/2] RISC-V: Better support for long instructions (assembler) Tsukasa OI
2022-11-25  8:15       ` Jan Beulich
2022-11-25  8:39         ` Tsukasa OI
2022-11-25  9:04           ` Jan Beulich
2022-11-25  9:18             ` Tsukasa OI
2022-11-25  9:56               ` Jan Beulich
2022-11-25 11:07                 ` Tsukasa OI
2022-11-25 11:41     ` [PATCH v3 0/3] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Tsukasa OI
2022-11-25 11:41       ` [PATCH v3 1/3] RISC-V: Better support for long instructions (disassembler) Tsukasa OI
2022-11-25 11:42     ` [PATCH v4 0/3] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Tsukasa OI
2022-11-25 11:42       ` [PATCH v4 1/3] RISC-V: Better support for long instructions (disassembler) Tsukasa OI
2022-11-25 11:42       ` [PATCH v4 2/3] RISC-V: Better support for long instructions (assembler) Tsukasa OI
2022-11-25 11:42       ` [PATCH v4 3/3] RISC-V: Better support for long instructions (tests) Tsukasa OI
2022-11-25 13:08       ` [PATCH v4 0/3] RISC-V: Better support for long instructions (64 < x <= 176 [bits]) Jan Beulich
2022-11-28  1:53         ` Nelson Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPpQWtD0yWw7B_3xAh8XpAiuoai48yr_G86Zzb_dSnCQKAz+Cw@mail.gmail.com \
    --to=nelson@rivosinc.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    --cc=kito.cheng@sifive.com \
    --cc=palmer@dabbelt.com \
    --cc=research_trasio@irq.a4lg.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).