public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
@ 2021-03-24  9:31 Jakub Jelinek
  2021-03-25 21:43 ` Jason Merrill
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2021-03-24  9:31 UTC (permalink / raw)
  To: Richard Henderson, Jason Merrill; +Cc: gcc-patches, gdb, Mark Wielaard

Hi!

On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
   0:	4c 8d 54 24 08       	lea    0x8(%rsp),%r10
   5:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
   9:	41 ff 72 f8          	pushq  -0x8(%r10)
   d:	55                   	push   %rbp
   e:	48 89 e5             	mov    %rsp,%rbp
  11:	41 57                	push   %r15
  13:	41 56                	push   %r14
  15:	41 55                	push   %r13
  17:	41 54                	push   %r12
  19:	41 52                	push   %r10
  1b:	53                   	push   %rbx
  1c:	48 83 ec 20          	sub    $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:     1b
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8
  DW_CFA_nop
  DW_CFA_nop

00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
  DW_CFA_advance_loc: 5 to 0000000000000005
  DW_CFA_def_cfa: r10 (r10) ofs 0
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 13 to 000000000000001b
  DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
  DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
  DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
  DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
  DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that.  The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct.  So, either we'd need to temporarily
say:
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
  DW_CFA_advance_loc: 3 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
  DW_CFA_advance_loc: 12 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b

The following patch implements the latter.  It keeps the call/jump and
flush queue handling as is and only moves the clobbers_queued_reg_save
forced flushes from before the instruction to immediately after the first
instruction.  My reading of libgcc unwinder is that while for calls it
executes cfi instruction until < pc (but pc is the end of call anyway),
for async signal stops it executes cfi instruction until <= pc, so it
will only omit cfi instructions starting with advance to a higher loc.
On the testcase from the PR, the difference with the patch is:
@@ -16,9 +16,9 @@ Contents of the .eh_frame section:
 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
   DW_CFA_advance_loc: 5 to 0000000000000005
   DW_CFA_def_cfa: r10 (r10) ofs 0
-  DW_CFA_advance_loc: 9 to 000000000000000e
+  DW_CFA_advance_loc: 12 to 0000000000000011
   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
-  DW_CFA_advance_loc: 13 to 000000000000001b
+  DW_CFA_advance_loc: 10 to 000000000000001b
   DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
   DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
   DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
When looking at other changes, e.g. on cc1plus itself when it has
identical objdump -dr, the without patch to with patch difference
is e.g.:
 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
   DW_CFA_advance_loc: 1 to 0000000000e2fc81
   DW_CFA_def_cfa_offset: 16
   DW_CFA_offset: r6 (rbp) at cfa-16
   DW_CFA_advance_loc: 3 to 0000000000e2fc84
   DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 6 to 0000000000e2fc8a
+  DW_CFA_advance_loc: 9 to 0000000000e2fc8d
   DW_CFA_offset: r15 (r15) at cfa-24
   DW_CFA_offset: r14 (r14) at cfa-32
   DW_CFA_offset: r13 (r13) at cfa-40
-  DW_CFA_advance_loc: 6 to 0000000000e2fc90
+  DW_CFA_advance_loc: 5 to 0000000000e2fc92
   DW_CFA_offset: r12 (r12) at cfa-48
   DW_CFA_offset: r3 (rbx) at cfa-56
-  DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
+  DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
   DW_CFA_remember_state
   DW_CFA_def_cfa: r7 (rsp) ofs 8
   DW_CFA_advance_loc: 1 to 0000000000e2fe20
   DW_CFA_restore_state
   DW_CFA_nop
   DW_CFA_nop
for
0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
  e2fc80:       55                      push   %rbp
  e2fc81:       48 89 e5                mov    %rsp,%rbp
  e2fc84:       41 57                   push   %r15
  e2fc86:       41 56                   push   %r14
  e2fc88:       41 55                   push   %r13
  e2fc8a:       45 31 ed                xor    %r13d,%r13d
  e2fc8d:       41 54                   push   %r12
  e2fc8f:       53                      push   %rbx
  e2fc90:       31 db                   xor    %ebx,%ebx
  e2fc92:       48 81 ec 98 02 00 00    sub    $0x298,%rsp
  e2fc99:       89 7d c4                mov    %edi,-0x3c(%rbp)
...
prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
starts applying before or after the xorl %r13d,%r13d instruction
and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
before or after the xorl %ebx,%ebx instruction.
Or:
-00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
+00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
   DW_CFA_advance_loc: 1 to 0000000000e3d161
   DW_CFA_def_cfa_offset: 16
   DW_CFA_offset: r6 (rbp) at cfa-16
   DW_CFA_advance_loc: 9 to 0000000000e3d16a
   DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 2 to 0000000000e3d16c
+  DW_CFA_advance_loc: 5 to 0000000000e3d16f
   DW_CFA_offset: r12 (r12) at cfa-24
-  DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
+  DW_CFA_advance_loc: 63 to 0000000000e3d1ae
   DW_CFA_def_cfa: r7 (rsp) ofs 8
-  DW_CFA_nop
-  DW_CFA_nop
-  DW_CFA_nop
for:
0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
  e3d160:       55                      push   %rbp
  e3d161:       48 63 c2                movslq %edx,%rax
  e3d164:       49 89 f8                mov    %rdi,%r8
  e3d167:       48 89 e5                mov    %rsp,%rbp
  e3d16a:       41 54                   push   %r12
  e3d16c:       49 89 c4                mov    %rax,%r12
  e3d16f:       48 83 ec 08             sub    $0x8,%rsp
again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
before movq %rax,%r12 instruction or after it.
I think if for some instructions it is possible that debugger or signal
would stop in the middle of instruction, with some side-effects of the
instruction already done and others not yet, then we can't do that,
but at the start of instructions that modify the register the side-effects
of the instruction should not have taken effect yet.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Will e.g. GDB be happy about the changes?

Thinking some more about this, what can be problematic on the GCC side
is inline asm, that can and often will contain multiple instructions.
For that is an easy fix, just test asm_noperands and handle
clobbers_queued_reg_save before the insn rather than after in that case.
But there is another problem, instruction patterns that emit multiple
hw instructions, code can stop in between them.  So, do we want some
instruction attribute that would conservatively tell us which instruction
patterns are guaranteed to be single machine instructions?

2021-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR debug/99334
	* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
	forced flushes of queued register saves for non-call/jump insns
	that can't throw, emit them first after the insn rather than
	before it.

--- gcc/dwarf2cfi.c.jj	2021-03-02 11:25:47.217727061 +0100
+++ gcc/dwarf2cfi.c	2021-03-23 17:34:58.240281522 +0100
@@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
 	      dwarf2out_flush_queued_reg_saves ();
 	    }
 	  else if (!NONJUMP_INSN_P (insn)
-		   || clobbers_queued_reg_save (insn)
 		   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
 	    dwarf2out_flush_queued_reg_saves ();
 	  any_cfis_emitted = false;
 
 	  add_cfi_insn = insn;
+
+	  if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
+	    dwarf2out_flush_queued_reg_saves ();
+
 	  scan_insn_after (insn);
 	  control = insn;
 	}


	Jakub


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-26 19:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24  9:31 [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Jakub Jelinek
2021-03-25 21:43 ` Jason Merrill
2021-03-26 17:29   ` Jakub Jelinek
2021-03-26 19:52     ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).