From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 9BFE8385702F for ; Thu, 23 Jun 2022 16:05:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9BFE8385702F Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-gSZ30eQZNsmFSWQEwdQ4vQ-1; Thu, 23 Jun 2022 12:05:32 -0400 X-MC-Unique: gSZ30eQZNsmFSWQEwdQ4vQ-1 Received: by mail-wm1-f69.google.com with SMTP id c185-20020a1c35c2000000b0039db3e56c39so1614230wma.5 for ; Thu, 23 Jun 2022 09:05:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2kyB1+hMZD3Ix5eOLEN1nDvEKhphYcAqYKejkx7s1EQ=; b=I7g3H8Mx8NOhxV+0s5lzjuDak943hHAV0ToF+ZLOb/ayICMvIYMDGyLALyf2t1djUK 2wOFLoMKXyHIl0r1e/zasdArNsVIfVJC4T5pWnz67mQfSg1wJFSyY4Vjc8S7RDJhxqgj dd5La2JpjQbXqR9ntTiMGBZxGv1ysFi0jNB4MjIJk18zNTJ0Dia8oW/GPnB+HFdQe+9w xuBGhx0R3ogVtmJ8/iY4WUN1Lr96hegzoUzA+uxlo9G3ZyOJ5egM5yDZnD4zBaumT5yy EuiAJBxIbypxF4mv4G9qump2Dn/mWNksEJpl/IPAmoHIPHHICDXEvLclFYFxsMHC4rWZ BloQ== X-Gm-Message-State: AJIora/CR0MR7evUAonCD/6Z6NbS9JANsogLFetwh2vvR2ly0KmSrEIs vVAomYiTeNFwfsD9rHCo8tgTSzFx2vo1fRPgfofzCejZGWYLmyIQagBtuK1ajcaMp/oVIbn+ozS i0LecIGK97H8xXjGLOZIhS1Gxs71AvpZWS9komPGcnc9qwJin+biUzDPDDwhjPA3DXUq0oltXKg == X-Received: by 2002:a05:600c:d3:b0:39f:e070:5ace with SMTP id u19-20020a05600c00d300b0039fe0705acemr4994082wmm.9.1656000331158; Thu, 23 Jun 2022 09:05:31 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t0edLEtq1EFsCkvXbxPNU6vHg2kduf6Kx79lUGvKvIbYhgxajqCVcxHso3CiYotu1NCSU9ww== X-Received: by 2002:a05:600c:d3:b0:39f:e070:5ace with SMTP id u19-20020a05600c00d300b0039fe0705acemr4993911wmm.9.1656000329482; Thu, 23 Jun 2022 09:05:29 -0700 (PDT) Received: from localhost ([195.213.152.79]) by smtp.gmail.com with ESMTPSA id q21-20020a1cf315000000b00397623ff335sm3734224wmq.10.2022.06.23.09.05.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jun 2022 09:05:28 -0700 (PDT) From: Andrew Burgess To: gdb-patches@sourceware.org Cc: Andrew Burgess Subject: [PATCH 4/9] gdb: disassembler opcode display formatting Date: Thu, 23 Jun 2022 17:05:11 +0100 Message-Id: <258091491fb98e6d4ce0ebef33f8fe434e1f8ff8.1655999715.git.aburgess@redhat.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2022 16:05:37 -0000 This commit changes the format of 'disassemble /r' to match GNU objdump. Specifically, GDB will now display the instruction bytes in as 'objdump --wide --disassemble' does. Here is an example for RISC-V before this patch: (gdb) disassemble /r 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e : 03 26 84 fe lw a2,-24(s0) 0x00010192 : 83 25 c4 fe lw a1,-20(s0) 0x00010196 : 61 65 lui a0,0x18 0x00010198 : 13 05 85 6a addi a0,a0,1704 0x0001019c : f1 22 jal 0x10368 End of assembler dump. And here's an example after this patch: (gdb) disassemble /r 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e : fe842603 lw a2,-24(s0) 0x00010192 : fec42583 lw a1,-20(s0) 0x00010196 : 6561 lui a0,0x18 0x00010198 : 6a850513 addi a0,a0,1704 0x0001019c : 22f1 jal 0x10368 End of assembler dump. There are two differences here. First, the instruction bytes after the patch are grouped based on the size of the instruction, and are byte-swapped to little-endian order. Second, after the patch, GDB now uses the bytes-per-line hint from libopcodes to add whitespace padding after the opcode bytes, this means that in most cases the instructions are nicely aligned. It is still possible for a very long instruction to intrude into the disassembled text space. The next example is x86-64, before the patch: (gdb) disassemble /r main Dump of assembler code for function main: 0x0000000000401106 <+0>: 55 push %rbp 0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp 0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi) 0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax 0x0000000000401119 <+19>: 5d pop %rbp 0x000000000040111a <+20>: c3 ret End of assembler dump. And after the patch: (gdb) disassemble /r main Dump of assembler code for function main: 0x0000000000401106 <+0>: 55 push %rbp 0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp 0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi) 0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax 0x0000000000401119 <+19>: 5d pop %rbp 0x000000000040111a <+20>: c3 ret End of assembler dump. Most instructions are aligned, except for the very long instruction. Notice too that for x86-64 libopcodes doesn't request that GDB group the instruction bytes. This matches the behaviour of objdump. In case the user really wants the old behaviour, I have added a new modifier 'disassemble /b', this displays the instruction byte at a time. For x86-64, which never groups instruction bytes, /b and /r are equivalent, but for RISC-V, using /b gets the old layout back (except that the whitespace for alignment is still present). Consider our original RISC-V example, this time using /b: (gdb) disassemble /b 0x0001018e,0x0001019e Dump of assembler code from 0x1018e to 0x1019e: 0x0001018e : 03 26 84 fe lw a2,-24(s0) 0x00010192 : 83 25 c4 fe lw a1,-20(s0) 0x00010196 : 61 65 lui a0,0x18 0x00010198 : 13 05 85 6a addi a0,a0,1704 0x0001019c : f1 22 jal 0x10368 End of assembler dump. Obviously, this patch is a potentially significant change to the behaviour or /r. I could have added /b with the new behaviour and left /r alone. However, personally, I feel the new behaviour is significantly better than the old, hence, I made /r be what I consider the "better" behaviour. The reason I prefer the new behaviour is that, when I use /r, I almost always want to manually decode the instruction for some reason, and having the bytes displayed in "instruction order" rather than memory order, just makes this easier. The 'record instruction-history' command also takes a /r modifier, and has been modified in the same way as disassemble; /r gets the new behaviour, and /b has been added to retain the old behaviour. Finally, the MI command -data-disassemble, is unchanged in behaviour, this command now requests the raw bytes of the instruction, which is equivalent to the /b modifier. This means that the MI output will remain backward compatible. --- gdb/NEWS | 12 +++++++ gdb/cli/cli-cmds.c | 6 ++++ gdb/disasm-flags.h | 1 + gdb/disasm.c | 43 ++++++++++++++++++++-- gdb/doc/gdb.texinfo | 48 +++++++++++++++++++++---- gdb/mi/mi-cmd-disas.c | 6 ++-- gdb/record.c | 3 ++ gdb/testsuite/gdb.mi/mi-disassemble.exp | 6 ++-- 8 files changed, 109 insertions(+), 16 deletions(-) diff --git a/gdb/NEWS b/gdb/NEWS index ac9a1aacd34..3e8853505eb 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -27,6 +27,18 @@ emit to indicate where a breakpoint should be placed to break in a function past its prologue. +* The format of 'disassemble /r' and 'record instruction-history /r' + has changed. The instruction bytes could now be grouped together, + and displayed in the endianness of the instruction. This is the + same layout as used by GNU objdump when disassembling. + + There is now 'disassemble /b' and 'record instruction-history /b' + which will always display the instructions bytes one at a time in + memory order, that is, the byte at the lowest address first. + + For both /r and /b GDB is now better at using whitespace in order to + align the disassembled instruction text. + * New commands maintenance set ignore-prologue-end-flag on|off diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c index 31911ebe61f..4950ee376a1 100644 --- a/gdb/cli/cli-cmds.c +++ b/gdb/cli/cli-cmds.c @@ -1508,6 +1508,9 @@ disassemble_current_function (gdb_disassembly_flags flags) A /r modifier will include raw instructions in hex with the assembly. + A /b modifier is similar to /r except the instruction bytes are printed + as separate bytes with no grouping, or endian switching. + A /s modifier will include source code with the assembly, like /m, with two important differences: 1) The output is still in pc address order. @@ -1546,6 +1549,9 @@ disassemble_command (const char *arg, int from_tty) case 'r': flags |= DISASSEMBLY_RAW_INSN; break; + case 'b': + flags |= DISASSEMBLY_RAW_BYTES; + break; case 's': flags |= DISASSEMBLY_SOURCE; break; diff --git a/gdb/disasm-flags.h b/gdb/disasm-flags.h index 025b6893941..5a7371b0a39 100644 --- a/gdb/disasm-flags.h +++ b/gdb/disasm-flags.h @@ -33,6 +33,7 @@ enum gdb_disassembly_flag DISASSEMBLY_OMIT_PC = (0x1 << 4), DISASSEMBLY_SOURCE = (0x1 << 5), DISASSEMBLY_SPECULATIVE = (0x1 << 6), + DISASSEMBLY_RAW_BYTES = (0x1 << 7), }; DEF_ENUM_FLAGS_TYPE (enum gdb_disassembly_flag, gdb_disassembly_flags); diff --git a/gdb/disasm.c b/gdb/disasm.c index 42351c735d3..946c235e7ef 100644 --- a/gdb/disasm.c +++ b/gdb/disasm.c @@ -321,7 +321,7 @@ gdb_pretty_print_disassembler::pretty_print_insn (const struct disasm_insn *insn throw ex; } - if (flags & DISASSEMBLY_RAW_INSN) + if ((flags & (DISASSEMBLY_RAW_INSN | DISASSEMBLY_RAW_BYTES)) != 0) { /* Build the opcodes using a temporary stream so we can write them out in a single go for the MI. */ @@ -331,14 +331,51 @@ gdb_pretty_print_disassembler::pretty_print_insn (const struct disasm_insn *insn m_opcode_data.resize (size); read_code (pc, m_opcode_data.data (), size); - for (int i = 0; i < size; ++i) + /* The disassembler provides information about the best way to + display the instruction bytes to the user. We provide some sane + defaults in case the disassembler gets it wrong. */ + const struct disassemble_info *di = m_di.disasm_info (); + int bytes_per_line = std::max (di->bytes_per_line, size); + int bytes_per_chunk = std::max (di->bytes_per_chunk, 1); + + /* If the user has requested the instruction bytes be displayed + byte at a time, then handle that here. Also, if the instruction + is not a multiple of the chunk size (which probably indicates a + disassembler problem) then avoid that causing display problems + by switching to byte at a time mode. */ + if ((flags & DISASSEMBLY_RAW_BYTES) != 0 + || (size % bytes_per_chunk) != 0) + bytes_per_chunk = 1; + + /* Print the instruction opcodes bytes, grouped into chunks. */ + for (int i = 0; i < size; i += bytes_per_chunk) { if (i > 0) m_opcode_stb.puts (" "); - m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i]); + + if (di->display_endian == BFD_ENDIAN_LITTLE) + { + for (int k = bytes_per_chunk; k-- != 0; ) + m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i + k]); + } + else + { + for (int k = 0; k < bytes_per_chunk; k++) + m_opcode_stb.printf ("%02x", (unsigned) m_opcode_data[i + k]); + } + } + + /* Calculate required padding. */ + int nspaces = 0; + for (int i = size; i < bytes_per_line; i += bytes_per_chunk) + { + if (i > size) + nspaces++; + nspaces += bytes_per_chunk * 2; } m_uiout->field_stream ("opcodes", m_opcode_stb); + m_uiout->spaces (nspaces); m_uiout->text ("\t"); } diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index d046ce5891e..af559370db0 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -7901,7 +7901,10 @@ It can also print mixed source+disassembly if you specify the the @code{/m} or @code{/s} modifier, and print the raw instructions in hex -as well as in symbolic form by specifying the @code{/r} modifier. +as well as in symbolic form by specifying the @code{/r} or @code{/b} +modifier. The behaviour of the @code{/m}, @code{/s}, @code{/r}, and +@code{/b} modifiers are the same as for the @kbd{disassemble} command +(@pxref{disassemble,,@kbd{disassemble}}). The current position marker is printed for the instruction at the current program counter value. This instruction can appear multiple @@ -9815,6 +9818,7 @@ specifying a location will display information about the next source line. +@anchor{disassemble} @table @code @kindex disassemble @cindex assembly instructions @@ -9825,16 +9829,17 @@ @itemx disassemble /m @itemx disassemble /s @itemx disassemble /r +@itemx disassemble /b This specialized command dumps a range of memory as machine instructions. It can also print mixed source+disassembly by specifying -the @code{/m} or @code{/s} modifier and print the raw instructions in hex -as well as in symbolic form by specifying the @code{/r} modifier. -The default memory range is the function surrounding the +the @code{/m} or @code{/s} modifier and print the raw instructions in +hex as well as in symbolic form by specifying the @code{/r} or @code{/b} +modifier. The default memory range is the function surrounding the program counter of the selected frame. A single argument to this command is a program counter value; @value{GDBN} dumps the function -surrounding this value. When two arguments are given, they should -be separated by a comma, possibly surrounded by whitespace. The -arguments specify a range of addresses to dump, in one of two forms: +surrounding this value. When two arguments are given, they should be +separated by a comma, possibly surrounded by whitespace. The arguments +specify a range of addresses to dump, in one of two forms: @table @code @item @var{start},@var{end} @@ -9872,6 +9877,35 @@ End of assembler dump. @end smallexample +The following two examples are for RISC-V, and demonstrates the +difference between the @code{/r} and @code{/b} modifiers. First with +@code{/b}, the bytes of the instruction are printed, in hex, in memory +order: + +@smallexample +(@value{GDBP}) disassemble /b 0x00010150,0x0001015c +Dump of assembler code from 0x10150 to 0x1015c: + 0x00010150 : 22 dc sw s0,56(sp) + 0x00010152 : 80 00 addi s0,sp,64 + 0x00010154 : 23 26 a4 fe sw a0,-20(s0) + 0x00010158 : 23 24 b4 fe sw a1,-24(s0) +End of assembler dump. +@end smallexample + +In contrast, with @code{/r} the bytes of the instruction are displayed +in the instruction order, for RISC-V this means that the bytes have been +swapped to little-endian order: + +@smallexample +(@value{GDBP}) disassemble /r 0x00010150,0x0001015c +Dump of assembler code from 0x10150 to 0x1015c: + 0x00010150 : dc22 sw s0,56(sp) + 0x00010152 : 0080 addi s0,sp,64 + 0x00010154 : fea42623 sw a0,-20(s0) + 0x00010158 : feb42423 sw a1,-24(s0) +End of assembler dump. +@end smallexample + Here is an example showing mixed source+assembly for Intel x86 with @code{/m} or @code{/s}, when the program is stopped just after function prologue in a non-optimized function with no inline code. diff --git a/gdb/mi/mi-cmd-disas.c b/gdb/mi/mi-cmd-disas.c index 387c4900150..c8e06cd940a 100644 --- a/gdb/mi/mi-cmd-disas.c +++ b/gdb/mi/mi-cmd-disas.c @@ -165,16 +165,16 @@ mi_cmd_disassemble (const char *command, char **argv, int argc) disasm_flags |= DISASSEMBLY_SOURCE_DEPRECATED; break; case 2: - disasm_flags |= DISASSEMBLY_RAW_INSN; + disasm_flags |= DISASSEMBLY_RAW_BYTES; break; case 3: - disasm_flags |= DISASSEMBLY_SOURCE_DEPRECATED | DISASSEMBLY_RAW_INSN; + disasm_flags |= DISASSEMBLY_SOURCE_DEPRECATED | DISASSEMBLY_RAW_BYTES; break; case 4: disasm_flags |= DISASSEMBLY_SOURCE; break; case 5: - disasm_flags |= DISASSEMBLY_SOURCE | DISASSEMBLY_RAW_INSN; + disasm_flags |= DISASSEMBLY_SOURCE | DISASSEMBLY_RAW_BYTES; break; default: gdb_assert_not_reached ("bad disassembly mode"); diff --git a/gdb/record.c b/gdb/record.c index 17a5df262bd..2390a58f9c0 100644 --- a/gdb/record.c +++ b/gdb/record.c @@ -494,6 +494,9 @@ get_insn_history_modifiers (const char **arg) case 'r': modifiers |= DISASSEMBLY_RAW_INSN; break; + case 'b': + modifiers |= DISASSEMBLY_RAW_BYTES; + break; case 'f': modifiers |= DISASSEMBLY_OMIT_FNAME; break; diff --git a/gdb/testsuite/gdb.mi/mi-disassemble.exp b/gdb/testsuite/gdb.mi/mi-disassemble.exp index b7c52472c84..ef3337d4062 100644 --- a/gdb/testsuite/gdb.mi/mi-disassemble.exp +++ b/gdb/testsuite/gdb.mi/mi-disassemble.exp @@ -245,12 +245,12 @@ proc test_disassembly_opcode_format {} { # then disassemble using the MI command. set longest_insn_bytes "" set longest_insn_addr "" - gdb_test_multiple "disassemble /r main" "" { - -re "^disassemble /r main\r\n" { + gdb_test_multiple "disassemble /b main" "" { + -re "^disassemble /b main\r\n" { exp_continue } - -re "^&\"disassemble /r main.n\"\r\n" { + -re "^&\"disassemble /b main.n\"\r\n" { exp_continue } -- 2.25.4