* RISC-V: mapping symbols vs "unimpl" @ 2022-02-14 8:26 Jan Beulich 2022-02-14 10:35 ` Andrew Waterman 0 siblings, 1 reply; 4+ messages in thread From: Jan Beulich @ 2022-02-14 8:26 UTC (permalink / raw) To: Nelson Chu; +Cc: Binutils Nelson, with your introduction of mapping symbols I have trouble expressing certain spec-conforming deliberately illegal 32-bit instructions (low 16 bits all zero). I can't use "unimpl" itself, as that for whichever reason resolves to "csrrw x0, cycle, x0" (and oddly enough using that form as input goes through without even a warning, but that's just a side note). I also can't use .insn, as that places requirements on the low two bits of the main opcode. And now I also can't use .word anymore, as that will cause mapping symbols to be inserted, which therefore - by a disassembler honoring the mapping symbols - won't disassemble as an instruction anymore. Do you have any suggestion how to encode a spec-conforming "unimpl", which I want to be part of my own disassembler's test cases (in particular the 0xffff0000 form)? Since the commit introducing the mapping symbols refers to Arm, I'd like to point out that their .insn equivalents allow to encode entirely arbitrary instruction forms. But of course I understand that RISC-V's insn length encoding scheme is somewhat in conflict with this. Two further remarks: Even ".insn ci ..." cannot be used, not even for forms with the high 16 bits not all set (which again the main opcode restriction would get in the way of): Already just temporarily enabling RVC causes the RVC bit to be set in the ELF header flags. Yet with that bit set 0xffff0000 is actually a 16-bit insn 0x0000 followed by a wider insn with the low 16 bits all set. IOW this conflicts with the spec's wording of "minimal length insn with the low 16 bits all zero". And then I'm puzzled by the main opcode restriction related error being raised a whopping 6 times for ".insn i 0, 0, x0, x30, ~0", and still twice instead of just once for ".insn ci 3, 7, x31, ~0". Jan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RISC-V: mapping symbols vs "unimpl" 2022-02-14 8:26 RISC-V: mapping symbols vs "unimpl" Jan Beulich @ 2022-02-14 10:35 ` Andrew Waterman 2022-02-14 11:24 ` Nelson Chu 0 siblings, 1 reply; 4+ messages in thread From: Andrew Waterman @ 2022-02-14 10:35 UTC (permalink / raw) To: Jan Beulich; +Cc: Nelson Chu, Binutils On Mon, Feb 14, 2022 at 12:27 AM Jan Beulich via Binutils <binutils@sourceware.org> wrote: > > Nelson, > > with your introduction of mapping symbols I have trouble expressing > certain spec-conforming deliberately illegal 32-bit instructions (low > 16 bits all zero). I can't use "unimpl" itself, as that for whichever > reason resolves to "csrrw x0, cycle, x0" I just wanted to remark on this seemingly arbitrary choice: we chose it because it's both architecturally guaranteed to be 32 bits wide and architecturally guaranteed to trap. (The CSR in question is definitionally read-only, and writing a read-only CSR must trap.) I realize it's of no help to your query, but I figured it was best not to leave this point dangling. > (and oddly enough using that > form as input goes through without even a warning, but that's just a > side note). I also can't use .insn, as that places requirements on > the low two bits of the main opcode. And now I also can't use .word > anymore, as that will cause mapping symbols to be inserted, which > therefore - by a disassembler honoring the mapping symbols - won't > disassemble as an instruction anymore. > > Do you have any suggestion how to encode a spec-conforming "unimpl", > which I want to be part of my own disassembler's test cases (in > particular the 0xffff0000 form)? Since the commit introducing the > mapping symbols refers to Arm, I'd like to point out that their .insn > equivalents allow to encode entirely arbitrary instruction forms. But > of course I understand that RISC-V's insn length encoding scheme is > somewhat in conflict with this. > > Two further remarks: Even ".insn ci ..." cannot be used, not even for > forms with the high 16 bits not all set (which again the main opcode > restriction would get in the way of): Already just temporarily > enabling RVC causes the RVC bit to be set in the ELF header flags. > Yet with that bit set 0xffff0000 is actually a 16-bit insn 0x0000 > followed by a wider insn with the low 16 bits all set. IOW this > conflicts with the spec's wording of "minimal length insn with the > low 16 bits all zero". > > And then I'm puzzled by the main opcode restriction related error > being raised a whopping 6 times for ".insn i 0, 0, x0, x30, ~0", > and still twice instead of just once for ".insn ci 3, 7, x31, ~0". > > Jan > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RISC-V: mapping symbols vs "unimpl" 2022-02-14 10:35 ` Andrew Waterman @ 2022-02-14 11:24 ` Nelson Chu 2022-02-14 12:49 ` Jan Beulich 0 siblings, 1 reply; 4+ messages in thread From: Nelson Chu @ 2022-02-14 11:24 UTC (permalink / raw) To: Andrew Waterman; +Cc: Jan Beulich, Binutils Hi Jan, On Mon, Feb 14, 2022 at 6:35 PM Andrew Waterman <andrew@sifive.com> wrote: > > On Mon, Feb 14, 2022 at 12:27 AM Jan Beulich via Binutils > <binutils@sourceware.org> wrote: > > > > Nelson, > > > > with your introduction of mapping symbols I have trouble expressing > > certain spec-conforming deliberately illegal 32-bit instructions (low > > 16 bits all zero). I can't use "unimpl" itself, as that for whichever > > reason resolves to "csrrw x0, cycle, x0" > > I just wanted to remark on this seemingly arbitrary choice: we chose > it because it's both architecturally guaranteed to be 32 bits wide and > architecturally guaranteed to trap. (The CSR in question is > definitionally read-only, and writing a read-only CSR must trap.) I > realize it's of no help to your query, but I figured it was best not > to leave this point dangling. I'm not really sure the history, but according the implementation, https://github.com/riscv-collab/riscv-binutils-gdb/blob/riscv-binutils-2.38/opcodes/riscv-opc.c#L272 The unimp is encoded as "0x000" when rvc is enable, and is encoded as "csrrw x0, cycle, x0" when rvc is disabled. > > (and oddly enough using that > > form as input goes through without even a warning, but that's just a > > side note). I also can't use .insn, as that places requirements on > > the low two bits of the main opcode. And now I also can't use .word > > anymore, as that will cause mapping symbols to be inserted, which > > therefore - by a disassembler honoring the mapping symbols - won't > > disassemble as an instruction anymore. > > > > Do you have any suggestion how to encode a spec-conforming "unimpl", > > which I want to be part of my own disassembler's test cases (in > > particular the 0xffff0000 form)? Since the commit introducing the > > mapping symbols refers to Arm, I'd like to point out that their .insn > > equivalents allow to encode entirely arbitrary instruction forms. But > > of course I understand that RISC-V's insn length encoding scheme is > > somewhat in conflict with this. We are used to checking if the instruction is valid or not for the .insn directive. So this made me think - maybe we should allow .insn to encode any instruction, including illegal formats, just like what Arm/AArch do as you mentioned. > > Two further remarks: Even ".insn ci ..." cannot be used, not even for > > forms with the high 16 bits not all set (which again the main opcode > > restriction would get in the way of): Already just temporarily > > enabling RVC causes the RVC bit to be set in the ELF header flags. > > Yet with that bit set 0xffff0000 is actually a 16-bit insn 0x0000 > > followed by a wider insn with the low 16 bits all set. IOW this > > conflicts with the spec's wording of "minimal length insn with the > > low 16 bits all zero". For now you probably can write 0xffff0000 by two .insn directives. For example, .insn 0x0 + .insn 0xffff, and then you should get something like, Disassembly of section .text: 0000000000000000 <.text>: 0: 0000 unimp 2: ffff .2byte 0xffff Anyway, this is just a workaround. I will find relevant people to discuss whether the .insn directive should be allowed to encode an illegal format. > > And then I'm puzzled by the main opcode restriction related error > > being raised a whopping 6 times for ".insn i 0, 0, x0, x30, ~0", > > and still twice instead of just once for ".insn ci 3, 7, x31, ~0". > > > > Jan > > Thanks Nelson ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RISC-V: mapping symbols vs "unimpl" 2022-02-14 11:24 ` Nelson Chu @ 2022-02-14 12:49 ` Jan Beulich 0 siblings, 0 replies; 4+ messages in thread From: Jan Beulich @ 2022-02-14 12:49 UTC (permalink / raw) To: Nelson Chu; +Cc: Binutils, Andrew Waterman On 14.02.2022 12:24, Nelson Chu wrote: > On Mon, Feb 14, 2022 at 6:35 PM Andrew Waterman <andrew@sifive.com> wrote: >> On Mon, Feb 14, 2022 at 12:27 AM Jan Beulich via Binutils >> <binutils@sourceware.org> wrote: >>> Do you have any suggestion how to encode a spec-conforming "unimpl", >>> which I want to be part of my own disassembler's test cases (in >>> particular the 0xffff0000 form)? Since the commit introducing the >>> mapping symbols refers to Arm, I'd like to point out that their .insn >>> equivalents allow to encode entirely arbitrary instruction forms. But >>> of course I understand that RISC-V's insn length encoding scheme is >>> somewhat in conflict with this. > > We are used to checking if the instruction is valid or not for the > .insn directive. So this made me think - maybe we should allow .insn > to encode any instruction, including illegal formats, just like what > Arm/AArch do as you mentioned. > >>> Two further remarks: Even ".insn ci ..." cannot be used, not even for >>> forms with the high 16 bits not all set (which again the main opcode >>> restriction would get in the way of): Already just temporarily >>> enabling RVC causes the RVC bit to be set in the ELF header flags. >>> Yet with that bit set 0xffff0000 is actually a 16-bit insn 0x0000 >>> followed by a wider insn with the low 16 bits all set. IOW this >>> conflicts with the spec's wording of "minimal length insn with the >>> low 16 bits all zero". > > For now you probably can write 0xffff0000 by two .insn directives. > For example, .insn 0x0 + .insn 0xffff, and then you should get > something like, > > Disassembly of section .text: > > 0000000000000000 <.text>: > 0: 0000 unimp > 2: ffff .2byte 0xffff > > Anyway, this is just a workaround. I will find relevant people to > discuss whether the .insn directive should be allowed to encode an > illegal format. Hmm, interesting. I wasn't aware of these alternative forms of .insn. But I wonder in how far I can actually rely on the present behavior: ".insn 0xffff" emitting a 16-bit value does not really form a valid 16-bit insn. I'd rather expect the "value conflicts with instruction length" in this case. And if a 16-bit insn is emitted, I'd expect a check for RVC being enabled, which in turn would have led to RVC being recorded in the header (creating the same problem as mentioned before again). But yes, for the immediate purpose with 2.38, "utilizing" its shortcomings, I should be able to use this approach. Looking at riscv_ip_hardcode() I also notice that .insn 8, 0x3f wrongly triggers that very error, due to the undefined behavior of the shift in this case. Jan ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-14 12:49 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-14 8:26 RISC-V: mapping symbols vs "unimpl" Jan Beulich 2022-02-14 10:35 ` Andrew Waterman 2022-02-14 11:24 ` Nelson Chu 2022-02-14 12:49 ` Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).