* [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] @ 2021-03-24 9:31 Jakub Jelinek 2021-03-25 21:43 ` Jason Merrill 0 siblings, 1 reply; 4+ messages in thread From: Jakub Jelinek @ 2021-03-24 9:31 UTC (permalink / raw) To: Richard Henderson, Jason Merrill; +Cc: gcc-patches, gdb, Mark Wielaard Hi! On the testcase in the PR with -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign we have prologue: 0000000000000000 <_func_with_dwarf_issue_>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 9: 41 ff 72 f8 pushq -0x8(%r10) d: 55 push %rbp e: 48 89 e5 mov %rsp,%rbp 11: 41 57 push %r15 13: 41 56 push %r14 15: 41 55 push %r13 17: 41 54 push %r12 19: 41 52 push %r10 1b: 53 push %rbx 1c: 48 83 ec 20 sub $0x20,%rsp and emit 00000000 0000000000000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -8 Return address column: 16 Augmentation data: 1b DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_offset: r16 (rip) at cfa-8 DW_CFA_nop DW_CFA_nop 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 DW_CFA_advance_loc: 5 to 0000000000000005 DW_CFA_def_cfa: r10 (r10) ofs 0 DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 13 to 000000000000001b DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24) DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32) ... unwind info for that. The problem is when async signal (or stepping through in the debugger) stops after the pushq %rbp instruction and before movq %rsp, %rbp, the unwind info says that caller's %rbp is saved there at *%rbp, but that is not true, caller's %rbp is either still available in the %rbp register, or in *%rsp, only after executing the next instruction - movq %rsp, %rbp - the location for %rbp is correct. So, either we'd need to temporarily say: DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0) DW_CFA_advance_loc: 3 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b or to me it seems more compact to just say: DW_CFA_advance_loc: 12 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b The following patch implements the latter. It keeps the call/jump and flush queue handling as is and only moves the clobbers_queued_reg_save forced flushes from before the instruction to immediately after the first instruction. My reading of libgcc unwinder is that while for calls it executes cfi instruction until < pc (but pc is the end of call anyway), for async signal stops it executes cfi instruction until <= pc, so it will only omit cfi instructions starting with advance to a higher loc. On the testcase from the PR, the difference with the patch is: @@ -16,9 +16,9 @@ Contents of the .eh_frame section: 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 DW_CFA_advance_loc: 5 to 0000000000000005 DW_CFA_def_cfa: r10 (r10) ofs 0 - DW_CFA_advance_loc: 9 to 000000000000000e + DW_CFA_advance_loc: 12 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) - DW_CFA_advance_loc: 13 to 000000000000001b + DW_CFA_advance_loc: 10 to 000000000000001b DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) When looking at other changes, e.g. on cc1plus itself when it has identical objdump -dr, the without patch to with patch difference is e.g.: 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d DW_CFA_advance_loc: 1 to 0000000000e2fc81 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r6 (rbp) at cfa-16 DW_CFA_advance_loc: 3 to 0000000000e2fc84 DW_CFA_def_cfa_register: r6 (rbp) - DW_CFA_advance_loc: 6 to 0000000000e2fc8a + DW_CFA_advance_loc: 9 to 0000000000e2fc8d DW_CFA_offset: r15 (r15) at cfa-24 DW_CFA_offset: r14 (r14) at cfa-32 DW_CFA_offset: r13 (r13) at cfa-40 - DW_CFA_advance_loc: 6 to 0000000000e2fc90 + DW_CFA_advance_loc: 5 to 0000000000e2fc92 DW_CFA_offset: r12 (r12) at cfa-48 DW_CFA_offset: r3 (rbx) at cfa-56 - DW_CFA_advance_loc2: 399 to 0000000000e2fe1f + DW_CFA_advance_loc2: 397 to 0000000000e2fe1f DW_CFA_remember_state DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_advance_loc: 1 to 0000000000e2fe20 DW_CFA_restore_state DW_CFA_nop DW_CFA_nop for 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>: e2fc80: 55 push %rbp e2fc81: 48 89 e5 mov %rsp,%rbp e2fc84: 41 57 push %r15 e2fc86: 41 56 push %r14 e2fc88: 41 55 push %r13 e2fc8a: 45 31 ed xor %r13d,%r13d e2fc8d: 41 54 push %r12 e2fc8f: 53 push %rbx e2fc90: 31 db xor %ebx,%ebx e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp) ... prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40 starts applying before or after the xorl %r13d,%r13d instruction and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying before or after the xorl %ebx,%ebx instruction. Or: -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af DW_CFA_advance_loc: 1 to 0000000000e3d161 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r6 (rbp) at cfa-16 DW_CFA_advance_loc: 9 to 0000000000e3d16a DW_CFA_def_cfa_register: r6 (rbp) - DW_CFA_advance_loc: 2 to 0000000000e3d16c + DW_CFA_advance_loc: 5 to 0000000000e3d16f DW_CFA_offset: r12 (r12) at cfa-24 - DW_CFA_advance_loc1: 66 to 0000000000e3d1ae + DW_CFA_advance_loc: 63 to 0000000000e3d1ae DW_CFA_def_cfa: r7 (rsp) ofs 8 - DW_CFA_nop - DW_CFA_nop - DW_CFA_nop for: 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>: e3d160: 55 push %rbp e3d161: 48 63 c2 movslq %edx,%rax e3d164: 49 89 f8 mov %rdi,%r8 e3d167: 48 89 e5 mov %rsp,%rbp e3d16a: 41 54 push %r12 e3d16c: 49 89 c4 mov %rax,%r12 e3d16f: 48 83 ec 08 sub $0x8,%rsp again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying before movq %rax,%r12 instruction or after it. I think if for some instructions it is possible that debugger or signal would stop in the middle of instruction, with some side-effects of the instruction already done and others not yet, then we can't do that, but at the start of instructions that modify the register the side-effects of the instruction should not have taken effect yet. Bootstrapped/regtested on x86_64-linux and i686-linux. Will e.g. GDB be happy about the changes? Thinking some more about this, what can be problematic on the GCC side is inline asm, that can and often will contain multiple instructions. For that is an easy fix, just test asm_noperands and handle clobbers_queued_reg_save before the insn rather than after in that case. But there is another problem, instruction patterns that emit multiple hw instructions, code can stop in between them. So, do we want some instruction attribute that would conservatively tell us which instruction patterns are guaranteed to be single machine instructions? 2021-03-24 Jakub Jelinek <jakub@redhat.com> PR debug/99334 * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn) forced flushes of queued register saves for non-call/jump insns that can't throw, emit them first after the insn rather than before it. --- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100 +++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100 @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e dwarf2out_flush_queued_reg_saves (); } else if (!NONJUMP_INSN_P (insn) - || clobbers_queued_reg_save (insn) || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL)) dwarf2out_flush_queued_reg_saves (); any_cfis_emitted = false; add_cfi_insn = insn; + + if (queued_reg_saves.length () && clobbers_queued_reg_save (insn)) + dwarf2out_flush_queued_reg_saves (); + scan_insn_after (insn); control = insn; } Jakub ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] 2021-03-24 9:31 [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Jakub Jelinek @ 2021-03-25 21:43 ` Jason Merrill 2021-03-26 17:29 ` Jakub Jelinek 0 siblings, 1 reply; 4+ messages in thread From: Jason Merrill @ 2021-03-25 21:43 UTC (permalink / raw) To: Jakub Jelinek, Richard Henderson; +Cc: gcc-patches, gdb, Mark Wielaard On 3/24/21 5:31 AM, Jakub Jelinek wrote: > Hi! > > On the testcase in the PR with > -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign > we have prologue: > 0000000000000000 <_func_with_dwarf_issue_>: > 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 > 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp > 9: 41 ff 72 f8 pushq -0x8(%r10) > d: 55 push %rbp > e: 48 89 e5 mov %rsp,%rbp > 11: 41 57 push %r15 > 13: 41 56 push %r14 > 15: 41 55 push %r13 > 17: 41 54 push %r12 > 19: 41 52 push %r10 > 1b: 53 push %rbx > 1c: 48 83 ec 20 sub $0x20,%rsp > and emit > 00000000 0000000000000014 00000000 CIE > Version: 1 > Augmentation: "zR" > Code alignment factor: 1 > Data alignment factor: -8 > Return address column: 16 > Augmentation data: 1b > DW_CFA_def_cfa: r7 (rsp) ofs 8 > DW_CFA_offset: r16 (rip) at cfa-8 > DW_CFA_nop > DW_CFA_nop > > 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 > DW_CFA_advance_loc: 5 to 0000000000000005 > DW_CFA_def_cfa: r10 (r10) ofs 0 > DW_CFA_advance_loc: 9 to 000000000000000e > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 13 to 000000000000001b > DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) > DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) > DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) > DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24) > DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32) > ... > unwind info for that. The problem is when async signal > (or stepping through in the debugger) stops after the pushq %rbp > instruction and before movq %rsp, %rbp, the unwind info says that > caller's %rbp is saved there at *%rbp, but that is not true, caller's > %rbp is either still available in the %rbp register, or in *%rsp, > only after executing the next instruction - movq %rsp, %rbp - the > location for %rbp is correct. So, either we'd need to temporarily > say: > DW_CFA_advance_loc: 9 to 000000000000000e > DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0) > DW_CFA_advance_loc: 3 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 10 to 000000000000001b > or to me it seems more compact to just say: > DW_CFA_advance_loc: 12 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 10 to 000000000000001b > > The following patch implements the latter. It keeps the call/jump and > flush queue handling as is and only moves the clobbers_queued_reg_save > forced flushes from before the instruction to immediately after the first > instruction. My reading of libgcc unwinder is that while for calls it > executes cfi instruction until < pc (but pc is the end of call anyway), > for async signal stops it executes cfi instruction until <= pc, so it > will only omit cfi instructions starting with advance to a higher loc. > On the testcase from the PR, the difference with the patch is: > @@ -16,9 +16,9 @@ Contents of the .eh_frame section: > 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 > DW_CFA_advance_loc: 5 to 0000000000000005 > DW_CFA_def_cfa: r10 (r10) ofs 0 > - DW_CFA_advance_loc: 9 to 000000000000000e > + DW_CFA_advance_loc: 12 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > - DW_CFA_advance_loc: 13 to 000000000000001b > + DW_CFA_advance_loc: 10 to 000000000000001b > DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) > DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) > DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) > When looking at other changes, e.g. on cc1plus itself when it has > identical objdump -dr, the without patch to with patch difference > is e.g.: > 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d > DW_CFA_advance_loc: 1 to 0000000000e2fc81 > DW_CFA_def_cfa_offset: 16 > DW_CFA_offset: r6 (rbp) at cfa-16 > DW_CFA_advance_loc: 3 to 0000000000e2fc84 > DW_CFA_def_cfa_register: r6 (rbp) > - DW_CFA_advance_loc: 6 to 0000000000e2fc8a > + DW_CFA_advance_loc: 9 to 0000000000e2fc8d > DW_CFA_offset: r15 (r15) at cfa-24 > DW_CFA_offset: r14 (r14) at cfa-32 > DW_CFA_offset: r13 (r13) at cfa-40 > - DW_CFA_advance_loc: 6 to 0000000000e2fc90 > + DW_CFA_advance_loc: 5 to 0000000000e2fc92 > DW_CFA_offset: r12 (r12) at cfa-48 > DW_CFA_offset: r3 (rbx) at cfa-56 > - DW_CFA_advance_loc2: 399 to 0000000000e2fe1f > + DW_CFA_advance_loc2: 397 to 0000000000e2fe1f > DW_CFA_remember_state > DW_CFA_def_cfa: r7 (rsp) ofs 8 > DW_CFA_advance_loc: 1 to 0000000000e2fe20 > DW_CFA_restore_state > DW_CFA_nop > DW_CFA_nop > for > 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>: > e2fc80: 55 push %rbp > e2fc81: 48 89 e5 mov %rsp,%rbp > e2fc84: 41 57 push %r15 > e2fc86: 41 56 push %r14 > e2fc88: 41 55 push %r13 > e2fc8a: 45 31 ed xor %r13d,%r13d > e2fc8d: 41 54 push %r12 > e2fc8f: 53 push %rbx > e2fc90: 31 db xor %ebx,%ebx > e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp > e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp) > ... > prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40 > starts applying before or after the xorl %r13d,%r13d instruction > and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying > before or after the xorl %ebx,%ebx instruction. > Or: > -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af > +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af > DW_CFA_advance_loc: 1 to 0000000000e3d161 > DW_CFA_def_cfa_offset: 16 > DW_CFA_offset: r6 (rbp) at cfa-16 > DW_CFA_advance_loc: 9 to 0000000000e3d16a > DW_CFA_def_cfa_register: r6 (rbp) > - DW_CFA_advance_loc: 2 to 0000000000e3d16c > + DW_CFA_advance_loc: 5 to 0000000000e3d16f > DW_CFA_offset: r12 (r12) at cfa-24 > - DW_CFA_advance_loc1: 66 to 0000000000e3d1ae > + DW_CFA_advance_loc: 63 to 0000000000e3d1ae > DW_CFA_def_cfa: r7 (rsp) ofs 8 > - DW_CFA_nop > - DW_CFA_nop > - DW_CFA_nop > for: > 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>: > e3d160: 55 push %rbp > e3d161: 48 63 c2 movslq %edx,%rax > e3d164: 49 89 f8 mov %rdi,%r8 > e3d167: 48 89 e5 mov %rsp,%rbp > e3d16a: 41 54 push %r12 > e3d16c: 49 89 c4 mov %rax,%r12 > e3d16f: 48 83 ec 08 sub $0x8,%rsp > again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying > before movq %rax,%r12 instruction or after it. > I think if for some instructions it is possible that debugger or signal > would stop in the middle of instruction, with some side-effects of the > instruction already done and others not yet, then we can't do that, > but at the start of instructions that modify the register the side-effects > of the instruction should not have taken effect yet. > > Bootstrapped/regtested on x86_64-linux and i686-linux. LGTM. This seems consistent with when we flush due to clobbers in dwarf2out_frame_debug. > Will e.g. GDB be happy about the changes? I would expect so, but it would be good to have someone from GDB verify. > Thinking some more about this, what can be problematic on the GCC side > is inline asm, that can and often will contain multiple instructions. > For that is an easy fix, just test asm_noperands and handle > clobbers_queued_reg_save before the insn rather than after in that case. Sure, but does inline asm go through dwarf2out_frame_debug? > But there is another problem, instruction patterns that emit multiple > hw instructions, code can stop in between them. So, do we want some > instruction attribute that would conservatively tell us which instruction > patterns are guaranteed to be single machine instructions? It seems to me that in that situation you'd want to add the save to *%rsp, and still not update to *%rbp until after the combined instruction. Incidentally, it seems a bit odd that clobbers_queued_reg_save returns true both for clobbering the saved register and for clobbering the save location; if I already know that %rbp is saved at *%rsp, why do I care that the insn we're looking at clobbers %rbp? Maybe we should only check modified_in_p (q->reg, insn) if q->reg doesn't appear in regs_saved_in_regs. That wouldn't help with this testcase, though. > 2021-03-24 Jakub Jelinek <jakub@redhat.com> > > PR debug/99334 > * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn) > forced flushes of queued register saves for non-call/jump insns > that can't throw, emit them first after the insn rather than > before it. > > --- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100 > +++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100 > @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e > dwarf2out_flush_queued_reg_saves (); > } > else if (!NONJUMP_INSN_P (insn) > - || clobbers_queued_reg_save (insn) > || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL)) > dwarf2out_flush_queued_reg_saves (); > any_cfis_emitted = false; > > add_cfi_insn = insn; > + > + if (queued_reg_saves.length () && clobbers_queued_reg_save (insn)) > + dwarf2out_flush_queued_reg_saves (); > + > scan_insn_after (insn); > control = insn; > } > > > Jakub > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] 2021-03-25 21:43 ` Jason Merrill @ 2021-03-26 17:29 ` Jakub Jelinek 2021-03-26 19:52 ` Jason Merrill 0 siblings, 1 reply; 4+ messages in thread From: Jakub Jelinek @ 2021-03-26 17:29 UTC (permalink / raw) To: Jason Merrill; +Cc: Richard Henderson, gdb, gcc-patches, Mark Wielaard On Thu, Mar 25, 2021 at 05:43:57PM -0400, Jason Merrill via Gcc-patches wrote: > > Will e.g. GDB be happy about the changes? > > I would expect so, but it would be good to have someone from GDB verify. Ok, have verified it now on the testcase from the PR (stubbed out what it calls, called it from main with all NULL pointers and just in gdb stepped through the prologue and at each insn looked at p $rsp p $rbp up p $rsp p $rbp down stepi With unpatched gcc indeed it works everywhere except when stopping at the start of movq %rsp, %rbp instruction where $rbp in the parent frame was unrelated garbage. And with the patch below, I get the expected values in all cases. > > Thinking some more about this, what can be problematic on the GCC side > > is inline asm, that can and often will contain multiple instructions. > > For that is an easy fix, just test asm_noperands and handle > > clobbers_queued_reg_save before the insn rather than after in that case. > > Sure, but does inline asm go through dwarf2out_frame_debug? Not through dwarf2out_frame_debug, as inline asm is not RTX_FRAME_RELATED_P, but it goes through that scan_trace hunk I was changing like any other insn (except for BARRIERs, instructions not in basic blocks, some NOTEs and DEBUG_INSNs). And clobbers_queued_reg_save (insn) can be true for the inline asm either because it stores some registers or because it clobbers them. And then there are other backend insns that expand to multiple hw instructions, e.g. in the i386 backend typically (but not sure if we guarantee that) marked as get_attr_type (insn) == TYPE_MULTI. So, if we wanted to do what my patch did, we would need to come up with some new (optional) insn attribute, say get_attr_undivisible (insn), e.g. define that attribute on i386 to "type" attribute equal to "multi" by default and perhaps with some exceptions if needed, and use that attribute if HAVE_ATTR_undivisible in dwarf2cfi.c to decide whether it can be emitted after the insn or needs to be emitted before it. > > But there is another problem, instruction patterns that emit multiple > > hw instructions, code can stop in between them. So, do we want some > > instruction attribute that would conservatively tell us which instruction > > patterns are guaranteed to be single machine instructions? > > It seems to me that in that situation you'd want to add the save to *%rsp, > and still not update to *%rbp until after the combined instruction. So, today I've tried instead to deal with it through REG_FRAME_RELATED_EXPR from the backend, but that failed miserably as explained in the PR, dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the dynamic stack realignment using drap register that only the i386 backend does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we can't emulate those rules. The following patch instead does the deferring of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and emits it on the (set hfp sp) assignment that must appear shortly after it and adds assertion that it is the case. The difference before/after the patch on the assembly is: --- pr99334.s~ 2021-03-26 15:42:40.881749380 +0100 +++ pr99334.s 2021-03-26 17:38:05.729161910 +0100 @@ -11,8 +11,8 @@ _func_with_dwarf_issue_: andq $-16, %rsp pushq -8(%r10) pushq %rbp - .cfi_escape 0x10,0x6,0x2,0x76,0 movq %rsp, %rbp + .cfi_escape 0x10,0x6,0x2,0x76,0 pushq %r15 pushq %r14 pushq %r13 i.e. does just what we IMHO need, after pushq %rbp %rbp still contains parent's frame value and so the save rule doesn't need to be overridden there, ditto at the start of the next insn before the side-effect took effect, and we override it only after it when %rbp already has the right value. If some other target adds dynamic stack realignment in the future and the offset 0 case wouldn't be true there, the code can be adjusted so that it works on all the drap architectures, I'm pretty sure the code would need other adjustments too. For the rule 18 and for the (set hfp sp) after it we already have asserts for the drap cases that check whether the code looks the way i?86/x86_64 emit it currently. Is this ok for trunk if it passes bootstrap/regtest on x86_64-linux and i686-linux (I've verified no other target has the drap stuff right now, nvptx does nvptx_get_drap_rtx but no other realignment stack (unclear why?), but nvptx is a non-RA target so dwarf2cfi doesn't apply to it)? 2021-03-26 Jakub Jelinek <jakub@redhat.com> PR debug/99334 * dwarf2out.h (struct dw_fde_node): Add rule18 member. * dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp) assignment with drap_reg active, queue reg save for hfp with offset 0 and flush queued reg saves. When handling a push with rule18, defer queueing reg save for hfp and just assert the offset is 0. (scan_trace): Assert that fde->rule18 is false. --- gcc/dwarf2out.h.jj 2021-01-04 10:25:38.718235126 +0100 +++ gcc/dwarf2out.h 2021-03-26 17:12:30.599007479 +0100 @@ -108,6 +108,12 @@ struct GTY(()) dw_fde_node { /* True iff dw_fde_second_begin label is in text_section or cold_text_section. */ unsigned second_in_std_section : 1; + /* True if Rule 18 described in dwarf2cfi.c is in action, i.e. for dynamic + stack realignment in between pushing of hard frame pointer to stack + and setting hard frame pointer to stack pointer. The register save for + hard frame pointer register should be emitted only on the latter + instruction. */ + unsigned rule18 : 1; }; --- gcc/dwarf2cfi.c.jj 2021-03-23 19:42:05.000000000 +0100 +++ gcc/dwarf2cfi.c 2021-03-26 17:31:58.133165709 +0100 @@ -1695,9 +1695,19 @@ dwarf2out_frame_debug_expr (rtx expr) if (fde && fde->stack_realign && REGNO (src) == STACK_POINTER_REGNUM) - gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM - && fde->drap_reg != INVALID_REGNUM - && cur_cfa->reg != dwf_regno (src)); + { + gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM + && fde->drap_reg != INVALID_REGNUM + && cur_cfa->reg != dwf_regno (src) + && fde->rule18); + fde->rule18 = 0; + /* The save of hard frame pointer has been deferred + until this point when Rule 18 applied. Emit it now. */ + queue_reg_save (dest, NULL_RTX, 0); + /* And as the instruction modifies the hard frame pointer, + flush the queue as well. */ + dwarf2out_flush_queued_reg_saves (); + } else queue_reg_save (src, dest, 0); } @@ -1907,6 +1917,7 @@ dwarf2out_frame_debug_expr (rtx expr) { gcc_assert (cur_cfa->reg != dw_frame_pointer_regnum); cur_trace->cfa_store.offset = 0; + fde->rule18 = 1; } if (cur_cfa->reg == dw_stack_pointer_regnum) @@ -2041,7 +2052,17 @@ dwarf2out_frame_debug_expr (rtx expr) span = NULL; if (!span) - queue_reg_save (src, NULL_RTX, offset); + { + if (fde->rule18) + /* Just verify the hard frame pointer save when doing dynamic + realignment uses expected offset. The actual queue_reg_save + needs to be deferred until the instruction that sets + hard frame pointer to stack pointer, see PR99334 for + details. */ + gcc_assert (known_eq (offset, 0)); + else + queue_reg_save (src, NULL_RTX, offset); + } else { /* We have a PARALLEL describing where the contents of SRC live. @@ -2732,6 +2753,7 @@ scan_trace (dw_trace_info *trace, bool e create_trace_edges (control); } + gcc_assert (!cfun->fde || !cfun->fde->rule18); add_cfi_insn = NULL; cur_row = NULL; cur_trace = NULL; Jakub ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] 2021-03-26 17:29 ` Jakub Jelinek @ 2021-03-26 19:52 ` Jason Merrill 0 siblings, 0 replies; 4+ messages in thread From: Jason Merrill @ 2021-03-26 19:52 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Richard Henderson, gdb, gcc-patches, Mark Wielaard On 3/26/21 1:29 PM, Jakub Jelinek wrote: > On Thu, Mar 25, 2021 at 05:43:57PM -0400, Jason Merrill via Gcc-patches wrote: >>> Will e.g. GDB be happy about the changes? >> >> I would expect so, but it would be good to have someone from GDB verify. > > Ok, have verified it now on the testcase from the PR (stubbed out what it > calls, called it from main with all NULL pointers and just in gdb stepped > through the prologue and at each insn looked at > p $rsp > p $rbp > up > p $rsp > p $rbp > down > stepi > With unpatched gcc indeed it works everywhere except when stopping > at the start of movq %rsp, %rbp instruction where $rbp in the parent > frame was unrelated garbage. > And with the patch below, I get the expected values in all cases. > >>> Thinking some more about this, what can be problematic on the GCC side >>> is inline asm, that can and often will contain multiple instructions. >>> For that is an easy fix, just test asm_noperands and handle >>> clobbers_queued_reg_save before the insn rather than after in that case. >> >> Sure, but does inline asm go through dwarf2out_frame_debug? > > Not through dwarf2out_frame_debug, as inline asm is not RTX_FRAME_RELATED_P, > but it goes through that scan_trace hunk I was changing like any other > insn (except for BARRIERs, instructions not in basic blocks, some NOTEs > and DEBUG_INSNs). And clobbers_queued_reg_save (insn) can be true for the > inline asm either because it stores some registers or because it clobbers > them. > And then there are other backend insns that expand to multiple hw > instructions, e.g. in the i386 backend typically (but not sure if we > guarantee that) marked as get_attr_type (insn) == TYPE_MULTI. > So, if we wanted to do what my patch did, we would need to come up > with some new (optional) insn attribute, say get_attr_undivisible (insn), > e.g. define that attribute on i386 to "type" attribute equal to "multi" > by default and perhaps with some exceptions if needed, and use that > attribute if HAVE_ATTR_undivisible in dwarf2cfi.c to decide whether it can > be emitted after the insn or needs to be emitted before it. > >>> But there is another problem, instruction patterns that emit multiple >>> hw instructions, code can stop in between them. So, do we want some >>> instruction attribute that would conservatively tell us which instruction >>> patterns are guaranteed to be single machine instructions? >> >> It seems to me that in that situation you'd want to add the save to *%rsp, >> and still not update to *%rbp until after the combined instruction. > > So, today I've tried instead to deal with it through REG_FRAME_RELATED_EXPR > from the backend, but that failed miserably as explained in the PR, > dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the > dynamic stack realignment using drap register that only the i386 backend > does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we > can't emulate those rules. The following patch instead does the deferring > of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and > emits it on the (set hfp sp) assignment that must appear shortly after it > and adds assertion that it is the case. > > The difference before/after the patch on the assembly is: > --- pr99334.s~ 2021-03-26 15:42:40.881749380 +0100 > +++ pr99334.s 2021-03-26 17:38:05.729161910 +0100 > @@ -11,8 +11,8 @@ _func_with_dwarf_issue_: > andq $-16, %rsp > pushq -8(%r10) > pushq %rbp > - .cfi_escape 0x10,0x6,0x2,0x76,0 > movq %rsp, %rbp > + .cfi_escape 0x10,0x6,0x2,0x76,0 > pushq %r15 > pushq %r14 > pushq %r13 > i.e. does just what we IMHO need, after pushq %rbp %rbp > still contains parent's frame value and so the save rule doesn't > need to be overridden there, ditto at the start of the next insn > before the side-effect took effect, and we override it only after > it when %rbp already has the right value. > > If some other target adds dynamic stack realignment in the future and > the offset 0 case wouldn't be true there, the code can be adjusted so that > it works on all the drap architectures, I'm pretty sure the code would > need other adjustments too. > > For the rule 18 and for the (set hfp sp) after it we already have asserts > for the drap cases that check whether the code looks the way i?86/x86_64 > emit it currently. > > Is this ok for trunk if it passes bootstrap/regtest on x86_64-linux and > i686-linux (I've verified no other target has the drap stuff right now, > nvptx does nvptx_get_drap_rtx but no other realignment stack (unclear why?), > but nvptx is a non-RA target so dwarf2cfi doesn't apply to it)? I looked into the issue a bit. So the problem is that after push %rbp, the queued reg save is trying to say that %rbp is now saved at CFA offset 0, which is correct. Except that then we hit > /* When stack is aligned, store REG using DW_CFA_expression with FP. */ > if (fde && fde->stack_realign) > { > cfi->dw_cfi_opc = DW_CFA_expression; > cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg; > cfi->dw_cfi_oprnd2.dw_cfi_loc > = build_cfa_aligned_loc (&cur_row->cfa, offset, > fde->stack_realignment); > } Which turns it into saying %rbp is saved at an offset from %rbp, which is not yet true; we can't safely do this transformation until the FP is actually set up. This code goes back to r138335, which introduced the DRAP handling. Your patch means that we don't emit the save until after the FP is set up, which works since this instruction is atomic: before the instruction, %rbp is in %rbp; after the instruction, %rbp is at *%rbp. Since this is specifically a problem with setting the frame pointer, your patch seems like a reasonable solution, assuming no other reg saves are ever emitted before this one. The patch is OK. > 2021-03-26 Jakub Jelinek <jakub@redhat.com> > > PR debug/99334 > * dwarf2out.h (struct dw_fde_node): Add rule18 member. > * dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp) > assignment with drap_reg active, queue reg save for hfp with offset 0 > and flush queued reg saves. When handling a push with rule18, > defer queueing reg save for hfp and just assert the offset is 0. > (scan_trace): Assert that fde->rule18 is false. > > --- gcc/dwarf2out.h.jj 2021-01-04 10:25:38.718235126 +0100 > +++ gcc/dwarf2out.h 2021-03-26 17:12:30.599007479 +0100 > @@ -108,6 +108,12 @@ struct GTY(()) dw_fde_node { > /* True iff dw_fde_second_begin label is in text_section or > cold_text_section. */ > unsigned second_in_std_section : 1; > + /* True if Rule 18 described in dwarf2cfi.c is in action, i.e. for dynamic > + stack realignment in between pushing of hard frame pointer to stack > + and setting hard frame pointer to stack pointer. The register save for > + hard frame pointer register should be emitted only on the latter > + instruction. */ > + unsigned rule18 : 1; > }; > > > --- gcc/dwarf2cfi.c.jj 2021-03-23 19:42:05.000000000 +0100 > +++ gcc/dwarf2cfi.c 2021-03-26 17:31:58.133165709 +0100 > @@ -1695,9 +1695,19 @@ dwarf2out_frame_debug_expr (rtx expr) > if (fde > && fde->stack_realign > && REGNO (src) == STACK_POINTER_REGNUM) > - gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM > - && fde->drap_reg != INVALID_REGNUM > - && cur_cfa->reg != dwf_regno (src)); > + { > + gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM > + && fde->drap_reg != INVALID_REGNUM > + && cur_cfa->reg != dwf_regno (src) > + && fde->rule18); > + fde->rule18 = 0; > + /* The save of hard frame pointer has been deferred > + until this point when Rule 18 applied. Emit it now. */ > + queue_reg_save (dest, NULL_RTX, 0); > + /* And as the instruction modifies the hard frame pointer, > + flush the queue as well. */ > + dwarf2out_flush_queued_reg_saves (); > + } > else > queue_reg_save (src, dest, 0); > } > @@ -1907,6 +1917,7 @@ dwarf2out_frame_debug_expr (rtx expr) > { > gcc_assert (cur_cfa->reg != dw_frame_pointer_regnum); > cur_trace->cfa_store.offset = 0; > + fde->rule18 = 1; > } > > if (cur_cfa->reg == dw_stack_pointer_regnum) > @@ -2041,7 +2052,17 @@ dwarf2out_frame_debug_expr (rtx expr) > span = NULL; > > if (!span) > - queue_reg_save (src, NULL_RTX, offset); > + { > + if (fde->rule18) > + /* Just verify the hard frame pointer save when doing dynamic > + realignment uses expected offset. The actual queue_reg_save > + needs to be deferred until the instruction that sets > + hard frame pointer to stack pointer, see PR99334 for > + details. */ > + gcc_assert (known_eq (offset, 0)); > + else > + queue_reg_save (src, NULL_RTX, offset); > + } > else > { > /* We have a PARALLEL describing where the contents of SRC live. > @@ -2732,6 +2753,7 @@ scan_trace (dw_trace_info *trace, bool e > create_trace_edges (control); > } > > + gcc_assert (!cfun->fde || !cfun->fde->rule18); > add_cfi_insn = NULL; > cur_row = NULL; > cur_trace = NULL; > > > Jakub > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-26 19:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-24 9:31 [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Jakub Jelinek 2021-03-25 21:43 ` Jason Merrill 2021-03-26 17:29 ` Jakub Jelinek 2021-03-26 19:52 ` Jason Merrill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).