From: Jakub Jelinek <jakub@redhat.com>
To: Richard Henderson <rth@twiddle.net>, Jason Merrill <jason@redhat.com>
Cc: gcc-patches@gcc.gnu.org, gdb@sourceware.org,
Mark Wielaard <mjw@redhat.com>
Subject: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
Date: Wed, 24 Mar 2021 10:31:32 +0100 [thread overview]
Message-ID: <20210324093132.GE186063@tucnak> (raw)
Hi!
On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
9: 41 ff 72 f8 pushq -0x8(%r10)
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 41 52 push %r10
1b: 53 push %rbx
1c: 48 83 ec 20 sub $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
DW_CFA_advance_loc: 5 to 0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 13 to 000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that. The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct. So, either we'd need to temporarily
say:
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
DW_CFA_advance_loc: 3 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
DW_CFA_advance_loc: 12 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
The following patch implements the latter. It keeps the call/jump and
flush queue handling as is and only moves the clobbers_queued_reg_save
forced flushes from before the instruction to immediately after the first
instruction. My reading of libgcc unwinder is that while for calls it
executes cfi instruction until < pc (but pc is the end of call anyway),
for async signal stops it executes cfi instruction until <= pc, so it
will only omit cfi instructions starting with advance to a higher loc.
On the testcase from the PR, the difference with the patch is:
@@ -16,9 +16,9 @@ Contents of the .eh_frame section:
00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
DW_CFA_advance_loc: 5 to 0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
- DW_CFA_advance_loc: 9 to 000000000000000e
+ DW_CFA_advance_loc: 12 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
- DW_CFA_advance_loc: 13 to 000000000000001b
+ DW_CFA_advance_loc: 10 to 000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
When looking at other changes, e.g. on cc1plus itself when it has
identical objdump -dr, the without patch to with patch difference
is e.g.:
0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
DW_CFA_advance_loc: 1 to 0000000000e2fc81
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 3 to 0000000000e2fc84
DW_CFA_def_cfa_register: r6 (rbp)
- DW_CFA_advance_loc: 6 to 0000000000e2fc8a
+ DW_CFA_advance_loc: 9 to 0000000000e2fc8d
DW_CFA_offset: r15 (r15) at cfa-24
DW_CFA_offset: r14 (r14) at cfa-32
DW_CFA_offset: r13 (r13) at cfa-40
- DW_CFA_advance_loc: 6 to 0000000000e2fc90
+ DW_CFA_advance_loc: 5 to 0000000000e2fc92
DW_CFA_offset: r12 (r12) at cfa-48
DW_CFA_offset: r3 (rbx) at cfa-56
- DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
+ DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
DW_CFA_remember_state
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_advance_loc: 1 to 0000000000e2fe20
DW_CFA_restore_state
DW_CFA_nop
DW_CFA_nop
for
0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
e2fc80: 55 push %rbp
e2fc81: 48 89 e5 mov %rsp,%rbp
e2fc84: 41 57 push %r15
e2fc86: 41 56 push %r14
e2fc88: 41 55 push %r13
e2fc8a: 45 31 ed xor %r13d,%r13d
e2fc8d: 41 54 push %r12
e2fc8f: 53 push %rbx
e2fc90: 31 db xor %ebx,%ebx
e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp
e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp)
...
prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
starts applying before or after the xorl %r13d,%r13d instruction
and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
before or after the xorl %ebx,%ebx instruction.
Or:
-00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
+00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
DW_CFA_advance_loc: 1 to 0000000000e3d161
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r6 (rbp) at cfa-16
DW_CFA_advance_loc: 9 to 0000000000e3d16a
DW_CFA_def_cfa_register: r6 (rbp)
- DW_CFA_advance_loc: 2 to 0000000000e3d16c
+ DW_CFA_advance_loc: 5 to 0000000000e3d16f
DW_CFA_offset: r12 (r12) at cfa-24
- DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
+ DW_CFA_advance_loc: 63 to 0000000000e3d1ae
DW_CFA_def_cfa: r7 (rsp) ofs 8
- DW_CFA_nop
- DW_CFA_nop
- DW_CFA_nop
for:
0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
e3d160: 55 push %rbp
e3d161: 48 63 c2 movslq %edx,%rax
e3d164: 49 89 f8 mov %rdi,%r8
e3d167: 48 89 e5 mov %rsp,%rbp
e3d16a: 41 54 push %r12
e3d16c: 49 89 c4 mov %rax,%r12
e3d16f: 48 83 ec 08 sub $0x8,%rsp
again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
before movq %rax,%r12 instruction or after it.
I think if for some instructions it is possible that debugger or signal
would stop in the middle of instruction, with some side-effects of the
instruction already done and others not yet, then we can't do that,
but at the start of instructions that modify the register the side-effects
of the instruction should not have taken effect yet.
Bootstrapped/regtested on x86_64-linux and i686-linux.
Will e.g. GDB be happy about the changes?
Thinking some more about this, what can be problematic on the GCC side
is inline asm, that can and often will contain multiple instructions.
For that is an easy fix, just test asm_noperands and handle
clobbers_queued_reg_save before the insn rather than after in that case.
But there is another problem, instruction patterns that emit multiple
hw instructions, code can stop in between them. So, do we want some
instruction attribute that would conservatively tell us which instruction
patterns are guaranteed to be single machine instructions?
2021-03-24 Jakub Jelinek <jakub@redhat.com>
PR debug/99334
* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
forced flushes of queued register saves for non-call/jump insns
that can't throw, emit them first after the insn rather than
before it.
--- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100
+++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100
@@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
dwarf2out_flush_queued_reg_saves ();
}
else if (!NONJUMP_INSN_P (insn)
- || clobbers_queued_reg_save (insn)
|| find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
dwarf2out_flush_queued_reg_saves ();
any_cfis_emitted = false;
add_cfi_insn = insn;
+
+ if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
+ dwarf2out_flush_queued_reg_saves ();
+
scan_insn_after (insn);
control = insn;
}
Jakub
next reply other threads:[~2021-03-24 9:31 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-24 9:31 Jakub Jelinek [this message]
2021-03-25 21:43 ` Jason Merrill
2021-03-26 17:29 ` Jakub Jelinek
2021-03-26 19:52 ` Jason Merrill
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210324093132.GE186063@tucnak \
--to=jakub@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=gdb@sourceware.org \
--cc=jason@redhat.com \
--cc=mjw@redhat.com \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).