From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=iYOW=3Z=irq.a4lg.com=research_trasio@sourceware.org>
Received: from mail-sender-0.a4lg.com (mail-sender.a4lg.com [153.120.152.154])
	by sourceware.org (Postfix) with ESMTPS id 92A013947419
	for <binutils@sourceware.org>; Fri, 25 Nov 2022 08:39:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 92A013947419
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=irq.a4lg.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irq.a4lg.com
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id 08A55300089;
	Fri, 25 Nov 2022 08:39:39 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=irq.a4lg.com;
	s=2017s01; t=1669365579;
	bh=HI9edsva13mTKXxYTnYFtxxSKHbApIu5lk3YSTkSBWY=;
	h=Message-ID:Date:Mime-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type:Content-Transfer-Encoding;
	b=Q09z/zg6k53jKoJ1BmpgHhj7rhTVt5Zh74r+Lv+L1AlN/6fCbT98jsH0pLeb08DVD
	 2fLaqFhCQeVTqKkUeixm/UklpXcAuYuFr1ptX+yntoncnnwa2hBShpTsj765qxQenm
	 ozIAwYvoZk+DMHYa0UWkEYjqqRgEX46ftAyjH5vo=
Message-ID: <5f723e04-ee60-08e2-0314-51ebd9418947@irq.a4lg.com>
Date: Fri, 25 Nov 2022 17:39:36 +0900
Mime-Version: 1.0
Subject: Re: [PATCH v3 2/2] RISC-V: Better support for long instructions
 (assembler)
Content-Language: en-US
To: Jan Beulich <jbeulich@suse.com>
Cc: binutils@sourceware.org
References: <cover.1669192210.git.research_trasio@irq.a4lg.com>
 <cover.1669342633.git.research_trasio@irq.a4lg.com>
 <47d1751320314f02bede4f6096c09b7f6585c94d.1669342633.git.research_trasio@irq.a4lg.com>
 <a2b10caf-299a-7481-91c7-59b495c2b1c6@suse.com>
From: Tsukasa OI <research_trasio@irq.a4lg.com>
In-Reply-To: <a2b10caf-299a-7481-91c7-59b495c2b1c6@suse.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>

On 2022/11/25 17:15, Jan Beulich wrote:
> On 25.11.2022 03:17, Tsukasa OI wrote:
>> From: Tsukasa OI <research_trasio@irq.a4lg.com>
>>
>> Commit bb996692bd96 ("RISC-V/gas: allow generating up to 176-bit
>> instructions with .insn") tried to start supporting long instructions but
>> it was insufficient.
>>
>> 1.  It heavily depended on the bignum internals (radix of 2^16),
>> 2.  It generates "value conflicts with instruction length" even if a big
>>     number instruction encoding does not exceed its expected length,
>> 3.  Because long opcode was handled separately (from struct riscv_cl_insn),
>>     some information like DWARF line number correspondence was missing and
>>
>> To solve these problems, this commit:
>>
>> 1.  Handles bignum (and its encodings) precisely and
>> 2.  Incorporates long opcode handling into regular instruction handling.
>>
>> gas/ChangeLog:
>>
>> 	* config/tc-riscv.c (struct riscv_cl_insn): Add long opcode field.
>> 	(create_insn) Clear long opcode marker.
>> 	(install_insn) Install longer opcode as well.
>> 	(s_riscv_insn) Likewise.
>> 	(riscv_ip_hardcode): Make big number handling stricter. Length and
>> 	the value conflicts only if the bignum size exceeds the expected
>> 	maximum length.
>> 	* testsuite/gas/riscv/insn.s: Add testcases such that big number
>> 	handling is required.
>> 	* testsuite/gas/riscv/insn.d: Likewise.
>> 	* testsuite/gas/riscv/insn-na.d: Likewise.
>> 	* testsuite/gas/riscv/insn-dwarf.d: Likewise.
>> ---
>>  gas/config/tc-riscv.c                | 38 ++++++++++++++++++++++------
>>  gas/testsuite/gas/riscv/insn-dwarf.d | 10 +++++++-
>>  gas/testsuite/gas/riscv/insn-na.d    |  8 ++++++
>>  gas/testsuite/gas/riscv/insn.d       | 22 ++++++++++++++++
>>  gas/testsuite/gas/riscv/insn.s       |  9 +++++++
>>  5 files changed, 78 insertions(+), 9 deletions(-)
>>
>> diff --git a/gas/config/tc-riscv.c b/gas/config/tc-riscv.c
>> index 019545171f5e..2237062d8b45 100644
>> --- a/gas/config/tc-riscv.c
>> +++ b/gas/config/tc-riscv.c
>> @@ -45,6 +45,9 @@ struct riscv_cl_insn
>>    /* The encoded instruction bits.  */
>>    insn_t insn_opcode;
>>  
>> +  /* The long encoded instruction bits ([0] is non-zero if used).  */
>> +  char insn_long_opcode[RISCV_MAX_INSN_LEN];
>> +
>>    /* The frag that contains the instruction.  */
>>    struct frag *frag;
>>  
>> @@ -714,6 +717,7 @@ create_insn (struct riscv_cl_insn *insn, const struct riscv_opcode *mo)
>>  {
>>    insn->insn_mo = mo;
>>    insn->insn_opcode = mo->match;
>> +  insn->insn_long_opcode[0] = 0;  /* Long insn has non-zero value.  */
>>    insn->frag = NULL;
>>    insn->where = 0;
>>    insn->fixp = NULL;
>> @@ -725,7 +729,10 @@ static void
>>  install_insn (const struct riscv_cl_insn *insn)
>>  {
>>    char *f = insn->frag->fr_literal + insn->where;
>> -  number_to_chars_littleendian (f, insn->insn_opcode, insn_length (insn));
>> +  if (insn->insn_long_opcode[0] != 0)
>> +    memcpy (f, insn->insn_long_opcode, insn_length (insn));
>> +  else
>> +    number_to_chars_littleendian (f, insn->insn_opcode, insn_length (insn));
>>  }
>>  
>>  /* Move INSN to offset WHERE in FRAG.  Adjust the fixups accordingly
>> @@ -3481,7 +3488,9 @@ riscv_ip_hardcode (char *str,
>>  	  values[num++] = (insn_t) imm_expr->X_add_number;
>>  	  break;
>>  	case O_big:
>> -	  values[num++] = generic_bignum[0];
>> +	  /* Extract lower 32-bits of a big number.
>> +	     Assume that generic_bignum_to_int32 work on such number.  */
>> +	  values[num++] = (insn_t) generic_bignum_to_int32 ();
>>  	  break;
>>  	default:
>>  	  /* The first value isn't constant, so it should be
>> @@ -3508,12 +3517,25 @@ riscv_ip_hardcode (char *str,
>>  
>>    if (imm_expr->X_op == O_big)
>>      {
>> -      if (bytes != imm_expr->X_add_number * CHARS_PER_LITTLENUM)
>> +      unsigned int llen = 0;
>> +      for (LITTLENUM_TYPE lval = generic_bignum[imm_expr->X_add_number - 1];
>> +	   lval != 0; llen++)
>> +	lval >>= BITS_PER_CHAR;
>> +      unsigned int repr_bytes
>> +	  = (imm_expr->X_add_number - 1) * CHARS_PER_LITTLENUM + llen;
>> +      if (bytes < repr_bytes)
>>  	return _("value conflicts with instruction length");
>> -      char *f = frag_more (bytes);
>> -      for (num = 0; num < imm_expr->X_add_number; ++num)
>> -	number_to_chars_littleendian (f + num * CHARS_PER_LITTLENUM,
>> -	                              generic_bignum[num], CHARS_PER_LITTLENUM);
>> +      for (num = 0; num < imm_expr->X_add_number - 1; ++num)
>> +	number_to_chars_littleendian (
>> +	    ip->insn_long_opcode + num * CHARS_PER_LITTLENUM,
>> +	    generic_bignum[num],
>> +	    CHARS_PER_LITTLENUM);
>> +      if (llen != 0)
>> +	number_to_chars_littleendian (
>> +	    ip->insn_long_opcode + num * CHARS_PER_LITTLENUM,
>> +	    generic_bignum[num],
>> +	    llen);
>> +      memset(ip->insn_long_opcode + repr_bytes, 0, bytes - repr_bytes);
>>        return NULL;
>>      }
>>  
>> @@ -4590,7 +4612,7 @@ s_riscv_insn (int x ATTRIBUTE_UNUSED)
>>        else
>>  	as_bad ("%s `%s'", error.msg, error.statement);
>>      }
>> -  else if (imm_expr.X_op != O_big)
>> +  else
>>      {
>>        gas_assert (insn.insn_mo->pinfo != INSN_MACRO);
>>        append_insn (&insn, &imm_expr, imm_reloc);
>> diff --git a/gas/testsuite/gas/riscv/insn-dwarf.d b/gas/testsuite/gas/riscv/insn-dwarf.d
>> index 89dc8d58ff09..b8bd42dff18c 100644
>> --- a/gas/testsuite/gas/riscv/insn-dwarf.d
>> +++ b/gas/testsuite/gas/riscv/insn-dwarf.d
>> @@ -74,5 +74,13 @@ insn.s +69 +0xf6.*
>>  insn.s +70 +0xfe.*
>>  insn.s +71 +0x108.*
>>  insn.s +72 +0x114.*
>> -insn.s +- +0x12a
>> +insn.s +74 +0x12a.*
>> +insn.s +75 +0x134.*
>> +insn.s +76 +0x13e.*
>> +insn.s +77 +0x154.*
>> +insn.s +78 +0x16a.*
>> +insn.s +79 +0x180.*
>> +insn.s +80 +0x196.*
>> +insn.s +81 +0x1ac.*
>> +insn.s +- +0x1c2
>>  #pass
>> diff --git a/gas/testsuite/gas/riscv/insn-na.d b/gas/testsuite/gas/riscv/insn-na.d
>> index 66dce71ebc21..6928ba9ba0f2 100644
>> --- a/gas/testsuite/gas/riscv/insn-na.d
>> +++ b/gas/testsuite/gas/riscv/insn-na.d
>> @@ -73,3 +73,11 @@ Disassembly of section .text:
>>  [^:]+:[ 	]+007f 0000 0000 0000 0000[ 	]+[._a-z].*
>>  [^:]+:[ 	]+0000107f 00000000 00000000[ 	]+[._a-z].*
>>  [^:]+:[ 	]+607f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000[ 	]+[._a-z].*
>> +[^:]+:[ 	]+007f 0000 0000 0000 8000[ 	]+\.byte[ 	]+0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80
>> +[^:]+:[ 	]+007f 0000 0000 0000 8000[ 	]+\.byte[ 	]+0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80
>> +[^:]+:[ 	]+607f 89ab 4567 0123 3210 7654 ba98 fedc 0000 0000 0000[ 	]+\.byte[ 	]+0x7f, 0x60, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
> 
> I have to admit that I still don't see what good the ".byte ..." part of
> the expectations does for the purpose of the test. In the cover letter
> you say "They are not 4-byte aligned (10 and 22-bytes) and unlikely to
> change any time soon." But changing that is exactly my plan (unless
> objected to by the arch maintainers): Showing insn components as bytes
> is imo reasonable for RISC-V at most when things aren't even 2-byte
> aligned. IOW I'd see these to be "disassembled" to ".2byte ...",
> matching the "raw opcode" output left to .<N>byte. In fact when raw
> opcodes are output I question the need for any .<N>byte - it's fully
> redundant>
> Bottom line: As before I'd prefer if these parts were dropped (to limit
> the churn on the files when changing the .<N>byte granularity), but I'm
> not going to insist. Apart from this the change looks good to me.
> 
> Jan
> 

Okay, I have to admit that I misunderstood your intent.

Quoting my PATCH v1 cover letter:

> [Disassembler: Instruction is trimmed with 64-bits]
> 
> In this section, we reuse the object file generated by the section above
> (two 22-byte instructions).
> 
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  zeroed out after first 64-bits
> 
> See ".byte" at the address 0x16.  It's trimmed at 64-bits.
> The resolution is simple.  If we simply add a char* argument (containing all
> instruction bits), we can easily resolve this.
> 
>     0000000000000000 <.text>:
>        0:   607f 0000 0000 0000     .byte   0x7f, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
>        8:   0000 0000 0000 0000
>       10:   0000 0000 0000
>       16:   607f 33cc 55aa cdef     .byte   0x7f, 0x60, 0xcc, 0x33, 0xaa, 0x55, 0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01, 0x10, 0x32, 0x54, 0x76, 0x98, 0xba, 0xdc, 0xfe
>       1e:   89ab 4567 0123 3210                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>       26:   7654 ba98 fedc                                                                  all instruction bits are correct

At least, I want to test whether disassembly of this part (after first
64-bits of an instruction) is fixed.  That is exactly what's fixed in
PATCH 1/2 and I want to make sure that this part is changed correctly.

If you change the output, you can freely remove or replace the testcase
according to the new output format but until that happens, I want to
keep those.  What am I missing?

Tsukasa