public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler
@ 2022-06-03 12:05 Tsukasa OI
  2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Tsukasa OI @ 2022-06-03 12:05 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt; +Cc: binutils

Hello,

I noticed another disassembler issue and this is a fix.


As I submitted in my previous patchset:
<https://sourceware.org/pipermail/binutils/2022-June/121159.html>,
there is a problem while disassembling zext.h instruction on
RV{32,64}_Zbb_Zbkb configuration.

On RV32, zext.h on Zbb is a specialized form of pack (Zbkb).
On RV64, zext.h on Zbb is a specialized form of packw (Zbkb).

Note that, if Zbkb extension is disabled, zext.h is a single
instruction with no "generalized" forms.


When **both** Zbb and Zbkb extensions are enabled and a zext.h
instruction is disassembled with `-M no-aliases', it should print
a non-alias instruction (that is, either pack or packw).

    # Input source file
    _start:
        zext.h a0, a1   # Specialized form on Zbb

    # Expected (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	pack	a0,a1,zero

However, it prints an alias.

    # Actual (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	zext.h	a0,a1

Note that, depending on -march, an "alias" is no longer an alias
(as I noted earlier).

    # Expected/Actual (RV32_Zbb, objdump -d -M no-aliases)
    80000028 <_start>:
    80000028:	0805c533          	zext.h	a0,a1


In general, this kind of issue occurs when:

1.  There are two instructions
    (one of them is a specialized form of another).
2.  Those requirements are different (separate) but can co-exist.

Because of 2., both instructions cannot be simple aliases (INSN_ALIAS
cannot be used).  However on non-alias instructions, if a match is
found, riscv_disassemble_insn thinks this is it and quits too early.

Because zext.h is declared before pack and packw, generalized forms are
not matched for zext.h.




As a solution, I propose a new concept: generic subsets.

For instructions with INSN_GENERICS, opcode matching cannot early-quit.
Instead it searches an instruction with:

-   Longest mask (by default; when -M no-aliases is NOT specified)
-   Shortest mask (when -M no-aliases is specified)

"Length" of the mask equals population count.  More one bits on the mask
means more specialized instruction form.

It fixes disassembler on following instructions and configurations:

-   zext.h  (Zbb)       <--> pack       (Zbkb; RV32)
-   zext.h  (Zbb)       <--> packw      (Zbkb; RV64)

Note that INSN_GENERICS (new flag in PATCH 2) must be set **both** on
specialized and generic forms.  In the example above, those instructions
require INSN_GENERICS:

-   zext.h (two XLEN-specific forms)
-   pack
-   packw

This concept can be used to following instruction pairs where the same
issue can occur once non-ratified instruction is ratified.

-   orc.b   (Zbb)       <--> gorci      (NOT RATIFIED)
-   brev8   (Zbkb)      <--> grevi      (NOT RATIFIED)
-   zip     (Zbkb)      <--> shfli      (NOT RATIFIED)
-   unzip   (Zbkb)      <--> unshfli    (NOT RATIFIED)
-   rev8    (Zbb/Zbkb)  <--> grevi      (NOT RATIFIED)




PATCH 1: Reorganize riscv_disassemble_insn function

riscv_disassemble_insn function (before PATCH 1) works like this:

    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        //
        // Print insn with matched opcode.
        //
        return insnlen;
    }
    //
    // opcode not matched.
    //

Because instruction printing code is in the opcode loop, this is not
good to fix the problem.  This patch reorganizes the function like this:

    matched_opcode = NULL;
    // Step 1: opcode matching
    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        matched_opcode = opcode;
        break;
    }
    // Step 2: opcode printing
    if (matched_opcode != NULL)
    {
        //
        // Print insn with matched_opcode.
        //
        return insnlen;
    }
    //
    // opcode not matched.
    //

PATCH 1 splits opcode loop to two separate steps: opcode matching and
printing.  With this, we can modify opcode matching step to implement
"generic subsets".




PATCH 2: Implement "generic subsets"

With this commit, riscv_disassemble_insn function works like this:

    int masklen;
    matched_opcode = NULL;
    // Step 1: opcode matching
    FOREACH (opcode)
    {
        if (!opcode.MATCH(insn))
            continue;
        if (!(opcode.pinfo & INSN_GENERICS))
        {
            matched_opcode = opcode;
            break;  // Early-quit as before (for non generic subsets)
        }
        // Handling of generic subsets
        if (matched_opcode == NULL)
        {
            matched_opcode = opcode;
            // Set corresponding masklen.
        }
        else if (no_aliases)
        {
            // Prefer opcode with shortest mask.
            // Update match_opcode and masklen where necessary.
        }
        else
        {
            // Prefer opcode with longest mask.
            // Update match_opcode and masklen where necessary.
        }
    }
    // Step 2: opcode printing (continues...)

PATCH 2 also adds INSN_GENERICS flag to two zext.h forms and pack/packw
instructions.




PATCH 3 contains additional testcases.




Thanks,
Tsukasa




Tsukasa OI (3):
  RISC-V: Split disasm opcode matching and printing
  RISC-V: Implement "generic subsets" for disasm
  RISC-V: Add testcases disassembling zext.h

 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s   |  2 +
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d   | 11 +++
 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s   |  4 +
 include/opcode/riscv.h                       |  4 +
 opcodes/riscv-dis.c                          | 93 ++++++++++++--------
 opcodes/riscv-opc.c                          |  8 +-
 9 files changed, 114 insertions(+), 41 deletions(-)
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
 create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s


base-commit: 625b6eae091709b95471eae92d42dc6bc71e6553
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-30  4:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 12:05 [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI
2022-06-03 12:05 ` [PATCH 1/3] RISC-V: Split disasm opcode matching and printing Tsukasa OI
2022-06-03 12:05 ` [PATCH 2/3] RISC-V: Implement "generic subsets" for disasm Tsukasa OI
2022-06-03 12:05 ` [PATCH 3/3] RISC-V: Add testcases disassembling zext.h Tsukasa OI
2022-07-30  4:20 ` [PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).