[PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
@ 2021-03-24  9:31 Jakub Jelinek
  2021-03-25 21:43 ` Jason Merrill
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2021-03-24  9:31 UTC (permalink / raw)
  To: Richard Henderson, Jason Merrill; +Cc: gcc-patches, gdb, Mark Wielaard

Hi!

On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
   0:	4c 8d 54 24 08       	lea    0x8(%rsp),%r10
   5:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
   9:	41 ff 72 f8          	pushq  -0x8(%r10)
   d:	55                   	push   %rbp
   e:	48 89 e5             	mov    %rsp,%rbp
  11:	41 57                	push   %r15
  13:	41 56                	push   %r14
  15:	41 55                	push   %r13
  17:	41 54                	push   %r12
  19:	41 52                	push   %r10
  1b:	53                   	push   %rbx
  1c:	48 83 ec 20          	sub    $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:     1b
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8
  DW_CFA_nop
  DW_CFA_nop

00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
  DW_CFA_advance_loc: 5 to 0000000000000005
  DW_CFA_def_cfa: r10 (r10) ofs 0
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 13 to 000000000000001b
  DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
  DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
  DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
  DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
  DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that.  The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct.  So, either we'd need to temporarily
say:
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
  DW_CFA_advance_loc: 3 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
  DW_CFA_advance_loc: 12 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b

The following patch implements the latter.  It keeps the call/jump and
flush queue handling as is and only moves the clobbers_queued_reg_save
forced flushes from before the instruction to immediately after the first
instruction.  My reading of libgcc unwinder is that while for calls it
executes cfi instruction until < pc (but pc is the end of call anyway),
for async signal stops it executes cfi instruction until <= pc, so it
will only omit cfi instructions starting with advance to a higher loc.
On the testcase from the PR, the difference with the patch is:
@@ -16,9 +16,9 @@ Contents of the .eh_frame section:
 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
   DW_CFA_advance_loc: 5 to 0000000000000005
   DW_CFA_def_cfa: r10 (r10) ofs 0
-  DW_CFA_advance_loc: 9 to 000000000000000e
+  DW_CFA_advance_loc: 12 to 0000000000000011
   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
-  DW_CFA_advance_loc: 13 to 000000000000001b
+  DW_CFA_advance_loc: 10 to 000000000000001b
   DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
   DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
   DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
When looking at other changes, e.g. on cc1plus itself when it has
identical objdump -dr, the without patch to with patch difference
is e.g.:
 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
   DW_CFA_advance_loc: 1 to 0000000000e2fc81
   DW_CFA_def_cfa_offset: 16
   DW_CFA_offset: r6 (rbp) at cfa-16
   DW_CFA_advance_loc: 3 to 0000000000e2fc84
   DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 6 to 0000000000e2fc8a
+  DW_CFA_advance_loc: 9 to 0000000000e2fc8d
   DW_CFA_offset: r15 (r15) at cfa-24
   DW_CFA_offset: r14 (r14) at cfa-32
   DW_CFA_offset: r13 (r13) at cfa-40
-  DW_CFA_advance_loc: 6 to 0000000000e2fc90
+  DW_CFA_advance_loc: 5 to 0000000000e2fc92
   DW_CFA_offset: r12 (r12) at cfa-48
   DW_CFA_offset: r3 (rbx) at cfa-56
-  DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
+  DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
   DW_CFA_remember_state
   DW_CFA_def_cfa: r7 (rsp) ofs 8
   DW_CFA_advance_loc: 1 to 0000000000e2fe20
   DW_CFA_restore_state
   DW_CFA_nop
   DW_CFA_nop
for
0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
  e2fc80:       55                      push   %rbp
  e2fc81:       48 89 e5                mov    %rsp,%rbp
  e2fc84:       41 57                   push   %r15
  e2fc86:       41 56                   push   %r14
  e2fc88:       41 55                   push   %r13
  e2fc8a:       45 31 ed                xor    %r13d,%r13d
  e2fc8d:       41 54                   push   %r12
  e2fc8f:       53                      push   %rbx
  e2fc90:       31 db                   xor    %ebx,%ebx
  e2fc92:       48 81 ec 98 02 00 00    sub    $0x298,%rsp
  e2fc99:       89 7d c4                mov    %edi,-0x3c(%rbp)
...
prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
starts applying before or after the xorl %r13d,%r13d instruction
and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
before or after the xorl %ebx,%ebx instruction.
Or:
-00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
+00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
   DW_CFA_advance_loc: 1 to 0000000000e3d161
   DW_CFA_def_cfa_offset: 16
   DW_CFA_offset: r6 (rbp) at cfa-16
   DW_CFA_advance_loc: 9 to 0000000000e3d16a
   DW_CFA_def_cfa_register: r6 (rbp)
-  DW_CFA_advance_loc: 2 to 0000000000e3d16c
+  DW_CFA_advance_loc: 5 to 0000000000e3d16f
   DW_CFA_offset: r12 (r12) at cfa-24
-  DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
+  DW_CFA_advance_loc: 63 to 0000000000e3d1ae
   DW_CFA_def_cfa: r7 (rsp) ofs 8
-  DW_CFA_nop
-  DW_CFA_nop
-  DW_CFA_nop
for:
0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
  e3d160:       55                      push   %rbp
  e3d161:       48 63 c2                movslq %edx,%rax
  e3d164:       49 89 f8                mov    %rdi,%r8
  e3d167:       48 89 e5                mov    %rsp,%rbp
  e3d16a:       41 54                   push   %r12
  e3d16c:       49 89 c4                mov    %rax,%r12
  e3d16f:       48 83 ec 08             sub    $0x8,%rsp
again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
before movq %rax,%r12 instruction or after it.
I think if for some instructions it is possible that debugger or signal
would stop in the middle of instruction, with some side-effects of the
instruction already done and others not yet, then we can't do that,
but at the start of instructions that modify the register the side-effects
of the instruction should not have taken effect yet.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Will e.g. GDB be happy about the changes?

Thinking some more about this, what can be problematic on the GCC side
is inline asm, that can and often will contain multiple instructions.
For that is an easy fix, just test asm_noperands and handle
clobbers_queued_reg_save before the insn rather than after in that case.
But there is another problem, instruction patterns that emit multiple
hw instructions, code can stop in between them.  So, do we want some
instruction attribute that would conservatively tell us which instruction
patterns are guaranteed to be single machine instructions?

2021-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR debug/99334
	* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
	forced flushes of queued register saves for non-call/jump insns
	that can't throw, emit them first after the insn rather than
	before it.

--- gcc/dwarf2cfi.c.jj	2021-03-02 11:25:47.217727061 +0100
+++ gcc/dwarf2cfi.c	2021-03-23 17:34:58.240281522 +0100
@@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
 	      dwarf2out_flush_queued_reg_saves ();
 	    }
 	  else if (!NONJUMP_INSN_P (insn)
-		   || clobbers_queued_reg_save (insn)
 		   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
 	    dwarf2out_flush_queued_reg_saves ();
 	  any_cfis_emitted = false;
 
 	  add_cfi_insn = insn;
+
+	  if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
+	    dwarf2out_flush_queued_reg_saves ();
+
 	  scan_insn_after (insn);
 	  control = insn;
 	}


	Jakub


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
  2021-03-24  9:31 [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Jakub Jelinek
@ 2021-03-25 21:43 ` Jason Merrill
  2021-03-26 17:29   ` Jakub Jelinek
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Merrill @ 2021-03-25 21:43 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Henderson; +Cc: gcc-patches, gdb, Mark Wielaard

On 3/24/21 5:31 AM, Jakub Jelinek wrote:
> Hi!
> 
> On the testcase in the PR with
> -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
> we have prologue:
> 0000000000000000 <_func_with_dwarf_issue_>:
>     0:	4c 8d 54 24 08       	lea    0x8(%rsp),%r10
>     5:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
>     9:	41 ff 72 f8          	pushq  -0x8(%r10)
>     d:	55                   	push   %rbp
>     e:	48 89 e5             	mov    %rsp,%rbp
>    11:	41 57                	push   %r15
>    13:	41 56                	push   %r14
>    15:	41 55                	push   %r13
>    17:	41 54                	push   %r12
>    19:	41 52                	push   %r10
>    1b:	53                   	push   %rbx
>    1c:	48 83 ec 20          	sub    $0x20,%rsp
> and emit
> 00000000 0000000000000014 00000000 CIE
>    Version:               1
>    Augmentation:          "zR"
>    Code alignment factor: 1
>    Data alignment factor: -8
>    Return address column: 16
>    Augmentation data:     1b
>    DW_CFA_def_cfa: r7 (rsp) ofs 8
>    DW_CFA_offset: r16 (rip) at cfa-8
>    DW_CFA_nop
>    DW_CFA_nop
> 
> 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
>    DW_CFA_advance_loc: 5 to 0000000000000005
>    DW_CFA_def_cfa: r10 (r10) ofs 0
>    DW_CFA_advance_loc: 9 to 000000000000000e
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 13 to 000000000000001b
>    DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
>    DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>    DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
>    DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
>    DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
> ...
> unwind info for that.  The problem is when async signal
> (or stepping through in the debugger) stops after the pushq %rbp
> instruction and before movq %rsp, %rbp, the unwind info says that
> caller's %rbp is saved there at *%rbp, but that is not true, caller's
> %rbp is either still available in the %rbp register, or in *%rsp,
> only after executing the next instruction - movq %rsp, %rbp - the
> location for %rbp is correct.  So, either we'd need to temporarily
> say:
>    DW_CFA_advance_loc: 9 to 000000000000000e
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
>    DW_CFA_advance_loc: 3 to 0000000000000011
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 10 to 000000000000001b
> or to me it seems more compact to just say:
>    DW_CFA_advance_loc: 12 to 0000000000000011
>    DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>    DW_CFA_advance_loc: 10 to 000000000000001b
> 
> The following patch implements the latter.  It keeps the call/jump and
> flush queue handling as is and only moves the clobbers_queued_reg_save
> forced flushes from before the instruction to immediately after the first
> instruction.  My reading of libgcc unwinder is that while for calls it
> executes cfi instruction until < pc (but pc is the end of call anyway),
> for async signal stops it executes cfi instruction until <= pc, so it
> will only omit cfi instructions starting with advance to a higher loc.
> On the testcase from the PR, the difference with the patch is:
> @@ -16,9 +16,9 @@ Contents of the .eh_frame section:
>   00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
>     DW_CFA_advance_loc: 5 to 0000000000000005
>     DW_CFA_def_cfa: r10 (r10) ofs 0
> -  DW_CFA_advance_loc: 9 to 000000000000000e
> +  DW_CFA_advance_loc: 12 to 0000000000000011
>     DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
> -  DW_CFA_advance_loc: 13 to 000000000000001b
> +  DW_CFA_advance_loc: 10 to 000000000000001b
>     DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
>     DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>     DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
> When looking at other changes, e.g. on cc1plus itself when it has
> identical objdump -dr, the without patch to with patch difference
> is e.g.:
>   0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d
>     DW_CFA_advance_loc: 1 to 0000000000e2fc81
>     DW_CFA_def_cfa_offset: 16
>     DW_CFA_offset: r6 (rbp) at cfa-16
>     DW_CFA_advance_loc: 3 to 0000000000e2fc84
>     DW_CFA_def_cfa_register: r6 (rbp)
> -  DW_CFA_advance_loc: 6 to 0000000000e2fc8a
> +  DW_CFA_advance_loc: 9 to 0000000000e2fc8d
>     DW_CFA_offset: r15 (r15) at cfa-24
>     DW_CFA_offset: r14 (r14) at cfa-32
>     DW_CFA_offset: r13 (r13) at cfa-40
> -  DW_CFA_advance_loc: 6 to 0000000000e2fc90
> +  DW_CFA_advance_loc: 5 to 0000000000e2fc92
>     DW_CFA_offset: r12 (r12) at cfa-48
>     DW_CFA_offset: r3 (rbx) at cfa-56
> -  DW_CFA_advance_loc2: 399 to 0000000000e2fe1f
> +  DW_CFA_advance_loc2: 397 to 0000000000e2fe1f
>     DW_CFA_remember_state
>     DW_CFA_def_cfa: r7 (rsp) ofs 8
>     DW_CFA_advance_loc: 1 to 0000000000e2fe20
>     DW_CFA_restore_state
>     DW_CFA_nop
>     DW_CFA_nop
> for
> 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>:
>    e2fc80:       55                      push   %rbp
>    e2fc81:       48 89 e5                mov    %rsp,%rbp
>    e2fc84:       41 57                   push   %r15
>    e2fc86:       41 56                   push   %r14
>    e2fc88:       41 55                   push   %r13
>    e2fc8a:       45 31 ed                xor    %r13d,%r13d
>    e2fc8d:       41 54                   push   %r12
>    e2fc8f:       53                      push   %rbx
>    e2fc90:       31 db                   xor    %ebx,%ebx
>    e2fc92:       48 81 ec 98 02 00 00    sub    $0x298,%rsp
>    e2fc99:       89 7d c4                mov    %edi,-0x3c(%rbp)
> ...
> prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40
> starts applying before or after the xorl %r13d,%r13d instruction
> and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying
> before or after the xorl %ebx,%ebx instruction.
> Or:
> -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
> +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af
>     DW_CFA_advance_loc: 1 to 0000000000e3d161
>     DW_CFA_def_cfa_offset: 16
>     DW_CFA_offset: r6 (rbp) at cfa-16
>     DW_CFA_advance_loc: 9 to 0000000000e3d16a
>     DW_CFA_def_cfa_register: r6 (rbp)
> -  DW_CFA_advance_loc: 2 to 0000000000e3d16c
> +  DW_CFA_advance_loc: 5 to 0000000000e3d16f
>     DW_CFA_offset: r12 (r12) at cfa-24
> -  DW_CFA_advance_loc1: 66 to 0000000000e3d1ae
> +  DW_CFA_advance_loc: 63 to 0000000000e3d1ae
>     DW_CFA_def_cfa: r7 (rsp) ofs 8
> -  DW_CFA_nop
> -  DW_CFA_nop
> -  DW_CFA_nop
> for:
> 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>:
>    e3d160:       55                      push   %rbp
>    e3d161:       48 63 c2                movslq %edx,%rax
>    e3d164:       49 89 f8                mov    %rdi,%r8
>    e3d167:       48 89 e5                mov    %rsp,%rbp
>    e3d16a:       41 54                   push   %r12
>    e3d16c:       49 89 c4                mov    %rax,%r12
>    e3d16f:       48 83 ec 08             sub    $0x8,%rsp
> again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying
> before movq %rax,%r12 instruction or after it.
> I think if for some instructions it is possible that debugger or signal
> would stop in the middle of instruction, with some side-effects of the
> instruction already done and others not yet, then we can't do that,
> but at the start of instructions that modify the register the side-effects
> of the instruction should not have taken effect yet.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

LGTM.  This seems consistent with when we flush due to clobbers in 
dwarf2out_frame_debug.

> Will e.g. GDB be happy about the changes?

I would expect so, but it would be good to have someone from GDB verify.

> Thinking some more about this, what can be problematic on the GCC side
> is inline asm, that can and often will contain multiple instructions.
> For that is an easy fix, just test asm_noperands and handle
> clobbers_queued_reg_save before the insn rather than after in that case.

Sure, but does inline asm go through dwarf2out_frame_debug?

> But there is another problem, instruction patterns that emit multiple
> hw instructions, code can stop in between them.  So, do we want some
> instruction attribute that would conservatively tell us which instruction
> patterns are guaranteed to be single machine instructions?

It seems to me that in that situation you'd want to add the save to 
*%rsp, and still not update to *%rbp until after the combined instruction.

Incidentally, it seems a bit odd that clobbers_queued_reg_save returns 
true both for clobbering the saved register and for clobbering the save 
location; if I already know that %rbp is saved at *%rsp, why do I care 
that the insn we're looking at clobbers %rbp?  Maybe we should only 
check modified_in_p (q->reg, insn) if q->reg doesn't appear in 
regs_saved_in_regs.  That wouldn't help with this testcase, though.

> 2021-03-24  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR debug/99334
> 	* dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn)
> 	forced flushes of queued register saves for non-call/jump insns
> 	that can't throw, emit them first after the insn rather than
> 	before it.
> 
> --- gcc/dwarf2cfi.c.jj	2021-03-02 11:25:47.217727061 +0100
> +++ gcc/dwarf2cfi.c	2021-03-23 17:34:58.240281522 +0100
> @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e
>   	      dwarf2out_flush_queued_reg_saves ();
>   	    }
>   	  else if (!NONJUMP_INSN_P (insn)
> -		   || clobbers_queued_reg_save (insn)
>   		   || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL))
>   	    dwarf2out_flush_queued_reg_saves ();
>   	  any_cfis_emitted = false;
>   
>   	  add_cfi_insn = insn;
> +
> +	  if (queued_reg_saves.length () && clobbers_queued_reg_save (insn))
> +	    dwarf2out_flush_queued_reg_saves ();
> +
>   	  scan_insn_after (insn);
>   	  control = insn;
>   	}
> 
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
  2021-03-25 21:43 ` Jason Merrill
@ 2021-03-26 17:29   ` Jakub Jelinek
  2021-03-26 19:52     ` Jason Merrill
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2021-03-26 17:29 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Richard Henderson, gdb, gcc-patches, Mark Wielaard

On Thu, Mar 25, 2021 at 05:43:57PM -0400, Jason Merrill via Gcc-patches wrote:
> > Will e.g. GDB be happy about the changes?
> 
> I would expect so, but it would be good to have someone from GDB verify.

Ok, have verified it now on the testcase from the PR (stubbed out what it
calls, called it from main with all NULL pointers and just in gdb stepped
through the prologue and at each insn looked at
p $rsp
p $rbp
up
p $rsp
p $rbp
down
stepi
With unpatched gcc indeed it works everywhere except when stopping
at the start of movq %rsp, %rbp instruction where $rbp in the parent
frame was unrelated garbage.
And with the patch below, I get the expected values in all cases.

> > Thinking some more about this, what can be problematic on the GCC side
> > is inline asm, that can and often will contain multiple instructions.
> > For that is an easy fix, just test asm_noperands and handle
> > clobbers_queued_reg_save before the insn rather than after in that case.
> 
> Sure, but does inline asm go through dwarf2out_frame_debug?

Not through dwarf2out_frame_debug, as inline asm is not RTX_FRAME_RELATED_P,
but it goes through that scan_trace hunk I was changing like any other
insn (except for BARRIERs, instructions not in basic blocks, some NOTEs
and DEBUG_INSNs).  And clobbers_queued_reg_save (insn) can be true for the
inline asm either because it stores some registers or because it clobbers
them.
And then there are other backend insns that expand to multiple hw
instructions, e.g. in the i386 backend typically (but not sure if we
guarantee that) marked as get_attr_type (insn) == TYPE_MULTI.
So, if we wanted to do what my patch did, we would need to come up
with some new (optional) insn attribute, say get_attr_undivisible (insn),
e.g. define that attribute on i386 to "type" attribute equal to "multi"
by default and perhaps with some exceptions if needed, and use that
attribute if HAVE_ATTR_undivisible in dwarf2cfi.c to decide whether it can
be emitted after the insn or needs to be emitted before it.

> > But there is another problem, instruction patterns that emit multiple
> > hw instructions, code can stop in between them.  So, do we want some
> > instruction attribute that would conservatively tell us which instruction
> > patterns are guaranteed to be single machine instructions?
> 
> It seems to me that in that situation you'd want to add the save to *%rsp,
> and still not update to *%rbp until after the combined instruction.

So, today I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
from the backend, but that failed miserably as explained in the PR,
dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
dynamic stack realignment using drap register that only the i386 backend
does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
can't emulate those rules.  The following patch instead does the deferring
of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
emits it on the (set hfp sp) assignment that must appear shortly after it
and adds assertion that it is the case.

The difference before/after the patch on the assembly is:
--- pr99334.s~	2021-03-26 15:42:40.881749380 +0100
+++ pr99334.s	2021-03-26 17:38:05.729161910 +0100
@@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
 	andq	$-16, %rsp
 	pushq	-8(%r10)
 	pushq	%rbp
-	.cfi_escape 0x10,0x6,0x2,0x76,0
 	movq	%rsp, %rbp
+	.cfi_escape 0x10,0x6,0x2,0x76,0
 	pushq	%r15
 	pushq	%r14
 	pushq	%r13
i.e. does just what we IMHO need, after pushq %rbp %rbp
still contains parent's frame value and so the save rule doesn't
need to be overridden there, ditto at the start of the next insn
before the side-effect took effect, and we override it only after
it when %rbp already has the right value.

If some other target adds dynamic stack realignment in the future and
the offset 0 case wouldn't be true there, the code can be adjusted so that
it works on all the drap architectures, I'm pretty sure the code would
need other adjustments too.

For the rule 18 and for the (set hfp sp) after it we already have asserts
for the drap cases that check whether the code looks the way i?86/x86_64
emit it currently.

Is this ok for trunk if it passes bootstrap/regtest on x86_64-linux and
i686-linux (I've verified no other target has the drap stuff right now,
nvptx does nvptx_get_drap_rtx but no other realignment stack (unclear why?),
but nvptx is a non-RA target so dwarf2cfi doesn't apply to it)?

2021-03-26  Jakub Jelinek  <jakub@redhat.com>

	PR debug/99334
	* dwarf2out.h (struct dw_fde_node): Add rule18 member.
	* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
	assignment with drap_reg active, queue reg save for hfp with offset 0
	and flush queued reg saves.  When handling a push with rule18,
	defer queueing reg save for hfp and just assert the offset is 0.
	(scan_trace): Assert that fde->rule18 is false.

--- gcc/dwarf2out.h.jj	2021-01-04 10:25:38.718235126 +0100
+++ gcc/dwarf2out.h	2021-03-26 17:12:30.599007479 +0100
@@ -108,6 +108,12 @@ struct GTY(()) dw_fde_node {
   /* True iff dw_fde_second_begin label is in text_section or
      cold_text_section.  */
   unsigned second_in_std_section : 1;
+  /* True if Rule 18 described in dwarf2cfi.c is in action, i.e. for dynamic
+     stack realignment in between pushing of hard frame pointer to stack
+     and setting hard frame pointer to stack pointer.  The register save for
+     hard frame pointer register should be emitted only on the latter
+     instruction.  */
+  unsigned rule18 : 1;
 };

--- gcc/dwarf2cfi.c.jj	2021-03-23 19:42:05.000000000 +0100
+++ gcc/dwarf2cfi.c	2021-03-26 17:31:58.133165709 +0100
@@ -1695,9 +1695,19 @@ dwarf2out_frame_debug_expr (rtx expr)
 	      if (fde
 		  && fde->stack_realign
 		  && REGNO (src) == STACK_POINTER_REGNUM)
-		gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
-			    && fde->drap_reg != INVALID_REGNUM
-			    && cur_cfa->reg != dwf_regno (src));
+		{
+		  gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
+			      && fde->drap_reg != INVALID_REGNUM
+			      && cur_cfa->reg != dwf_regno (src)
+			      && fde->rule18);
+		  fde->rule18 = 0;
+		  /* The save of hard frame pointer has been deferred
+		     until this point when Rule 18 applied.  Emit it now.  */
+		  queue_reg_save (dest, NULL_RTX, 0);
+		  /* And as the instruction modifies the hard frame pointer,
+		     flush the queue as well.  */
+		  dwarf2out_flush_queued_reg_saves ();
+		}
 	      else
 		queue_reg_save (src, dest, 0);
 	    }
@@ -1907,6 +1917,7 @@ dwarf2out_frame_debug_expr (rtx expr)
 	    {
 	      gcc_assert (cur_cfa->reg != dw_frame_pointer_regnum);
 	      cur_trace->cfa_store.offset = 0;
+	      fde->rule18 = 1;
 	    }

 	  if (cur_cfa->reg == dw_stack_pointer_regnum)
@@ -2041,7 +2052,17 @@ dwarf2out_frame_debug_expr (rtx expr)
 	span = NULL;

       if (!span)
-	queue_reg_save (src, NULL_RTX, offset);
+	{
+	  if (fde->rule18)
+	    /* Just verify the hard frame pointer save when doing dynamic
+	       realignment uses expected offset.  The actual queue_reg_save
+	       needs to be deferred until the instruction that sets
+	       hard frame pointer to stack pointer, see PR99334 for
+	       details.  */
+	    gcc_assert (known_eq (offset, 0));
+	  else
+	    queue_reg_save (src, NULL_RTX, offset);
+	}
       else
 	{
 	  /* We have a PARALLEL describing where the contents of SRC live.
@@ -2732,6 +2753,7 @@ scan_trace (dw_trace_info *trace, bool e
       create_trace_edges (control);
     }

+  gcc_assert (!cfun->fde || !cfun->fde->rule18);
   add_cfi_insn = NULL;
   cur_row = NULL;
   cur_trace = NULL;

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334]
  2021-03-26 17:29   ` Jakub Jelinek
@ 2021-03-26 19:52     ` Jason Merrill
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Merrill @ 2021-03-26 19:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, gdb, gcc-patches, Mark Wielaard

On 3/26/21 1:29 PM, Jakub Jelinek wrote:
> On Thu, Mar 25, 2021 at 05:43:57PM -0400, Jason Merrill via Gcc-patches wrote:
>>> Will e.g. GDB be happy about the changes?
>>
>> I would expect so, but it would be good to have someone from GDB verify.
> 
> Ok, have verified it now on the testcase from the PR (stubbed out what it
> calls, called it from main with all NULL pointers and just in gdb stepped
> through the prologue and at each insn looked at
> p $rsp
> p $rbp
> up
> p $rsp
> p $rbp
> down
> stepi
> With unpatched gcc indeed it works everywhere except when stopping
> at the start of movq %rsp, %rbp instruction where $rbp in the parent
> frame was unrelated garbage.
> And with the patch below, I get the expected values in all cases.
> 
>>> Thinking some more about this, what can be problematic on the GCC side
>>> is inline asm, that can and often will contain multiple instructions.
>>> For that is an easy fix, just test asm_noperands and handle
>>> clobbers_queued_reg_save before the insn rather than after in that case.
>>
>> Sure, but does inline asm go through dwarf2out_frame_debug?
> 
> Not through dwarf2out_frame_debug, as inline asm is not RTX_FRAME_RELATED_P,
> but it goes through that scan_trace hunk I was changing like any other
> insn (except for BARRIERs, instructions not in basic blocks, some NOTEs
> and DEBUG_INSNs).  And clobbers_queued_reg_save (insn) can be true for the
> inline asm either because it stores some registers or because it clobbers
> them.
> And then there are other backend insns that expand to multiple hw
> instructions, e.g. in the i386 backend typically (but not sure if we
> guarantee that) marked as get_attr_type (insn) == TYPE_MULTI.
> So, if we wanted to do what my patch did, we would need to come up
> with some new (optional) insn attribute, say get_attr_undivisible (insn),
> e.g. define that attribute on i386 to "type" attribute equal to "multi"
> by default and perhaps with some exceptions if needed, and use that
> attribute if HAVE_ATTR_undivisible in dwarf2cfi.c to decide whether it can
> be emitted after the insn or needs to be emitted before it.
> 
>>> But there is another problem, instruction patterns that emit multiple
>>> hw instructions, code can stop in between them.  So, do we want some
>>> instruction attribute that would conservatively tell us which instruction
>>> patterns are guaranteed to be single machine instructions?
>>
>> It seems to me that in that situation you'd want to add the save to *%rsp,
>> and still not update to *%rbp until after the combined instruction.
> 
> So, today I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
> from the backend, but that failed miserably as explained in the PR,
> dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
> dynamic stack realignment using drap register that only the i386 backend
> does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
> can't emulate those rules.  The following patch instead does the deferring
> of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
> emits it on the (set hfp sp) assignment that must appear shortly after it
> and adds assertion that it is the case.
> 
> The difference before/after the patch on the assembly is:
> --- pr99334.s~	2021-03-26 15:42:40.881749380 +0100
> +++ pr99334.s	2021-03-26 17:38:05.729161910 +0100
> @@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
>   	andq	$-16, %rsp
>   	pushq	-8(%r10)
>   	pushq	%rbp
> -	.cfi_escape 0x10,0x6,0x2,0x76,0
>   	movq	%rsp, %rbp
> +	.cfi_escape 0x10,0x6,0x2,0x76,0
>   	pushq	%r15
>   	pushq	%r14
>   	pushq	%r13
> i.e. does just what we IMHO need, after pushq %rbp %rbp
> still contains parent's frame value and so the save rule doesn't
> need to be overridden there, ditto at the start of the next insn
> before the side-effect took effect, and we override it only after
> it when %rbp already has the right value.
> 
> If some other target adds dynamic stack realignment in the future and
> the offset 0 case wouldn't be true there, the code can be adjusted so that
> it works on all the drap architectures, I'm pretty sure the code would
> need other adjustments too.
> 
> For the rule 18 and for the (set hfp sp) after it we already have asserts
> for the drap cases that check whether the code looks the way i?86/x86_64
> emit it currently.
> 
> Is this ok for trunk if it passes bootstrap/regtest on x86_64-linux and
> i686-linux (I've verified no other target has the drap stuff right now,
> nvptx does nvptx_get_drap_rtx but no other realignment stack (unclear why?),
> but nvptx is a non-RA target so dwarf2cfi doesn't apply to it)?

I looked into the issue a bit.  So the problem is that after push %rbp, 
the queued reg save is trying to say that %rbp is now saved at CFA 
offset 0, which is correct.  Except that then we hit

>       /* When stack is aligned, store REG using DW_CFA_expression with FP.  */
>       if (fde && fde->stack_realign)
>         {
>           cfi->dw_cfi_opc = DW_CFA_expression;
>           cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
>           cfi->dw_cfi_oprnd2.dw_cfi_loc
>             = build_cfa_aligned_loc (&cur_row->cfa, offset,
>                                      fde->stack_realignment);
>         }

Which turns it into saying %rbp is saved at an offset from %rbp, which 
is not yet true; we can't safely do this transformation until the FP is 
actually set up.

This code goes back to r138335, which introduced the DRAP handling.

Your patch means that we don't emit the save until after the FP is set 
up, which works since this instruction is atomic: before the 
instruction, %rbp is in %rbp; after the instruction, %rbp is at *%rbp.

Since this is specifically a problem with setting the frame pointer, 
your patch seems like a reasonable solution, assuming no other reg saves 
are ever emitted before this one.

The patch is OK.

> 2021-03-26  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR debug/99334
> 	* dwarf2out.h (struct dw_fde_node): Add rule18 member.
> 	* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
> 	assignment with drap_reg active, queue reg save for hfp with offset 0
> 	and flush queued reg saves.  When handling a push with rule18,
> 	defer queueing reg save for hfp and just assert the offset is 0.
> 	(scan_trace): Assert that fde->rule18 is false.
> 
> --- gcc/dwarf2out.h.jj	2021-01-04 10:25:38.718235126 +0100
> +++ gcc/dwarf2out.h	2021-03-26 17:12:30.599007479 +0100
> @@ -108,6 +108,12 @@ struct GTY(()) dw_fde_node {
>     /* True iff dw_fde_second_begin label is in text_section or
>        cold_text_section.  */
>     unsigned second_in_std_section : 1;
> +  /* True if Rule 18 described in dwarf2cfi.c is in action, i.e. for dynamic
> +     stack realignment in between pushing of hard frame pointer to stack
> +     and setting hard frame pointer to stack pointer.  The register save for
> +     hard frame pointer register should be emitted only on the latter
> +     instruction.  */
> +  unsigned rule18 : 1;
>   };
>   
>   
> --- gcc/dwarf2cfi.c.jj	2021-03-23 19:42:05.000000000 +0100
> +++ gcc/dwarf2cfi.c	2021-03-26 17:31:58.133165709 +0100
> @@ -1695,9 +1695,19 @@ dwarf2out_frame_debug_expr (rtx expr)
>   	      if (fde
>   		  && fde->stack_realign
>   		  && REGNO (src) == STACK_POINTER_REGNUM)
> -		gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
> -			    && fde->drap_reg != INVALID_REGNUM
> -			    && cur_cfa->reg != dwf_regno (src));
> +		{
> +		  gcc_assert (REGNO (dest) == HARD_FRAME_POINTER_REGNUM
> +			      && fde->drap_reg != INVALID_REGNUM
> +			      && cur_cfa->reg != dwf_regno (src)
> +			      && fde->rule18);
> +		  fde->rule18 = 0;
> +		  /* The save of hard frame pointer has been deferred
> +		     until this point when Rule 18 applied.  Emit it now.  */
> +		  queue_reg_save (dest, NULL_RTX, 0);
> +		  /* And as the instruction modifies the hard frame pointer,
> +		     flush the queue as well.  */
> +		  dwarf2out_flush_queued_reg_saves ();
> +		}
>   	      else
>   		queue_reg_save (src, dest, 0);
>   	    }
> @@ -1907,6 +1917,7 @@ dwarf2out_frame_debug_expr (rtx expr)
>   	    {
>   	      gcc_assert (cur_cfa->reg != dw_frame_pointer_regnum);
>   	      cur_trace->cfa_store.offset = 0;
> +	      fde->rule18 = 1;
>   	    }
>   
>   	  if (cur_cfa->reg == dw_stack_pointer_regnum)
> @@ -2041,7 +2052,17 @@ dwarf2out_frame_debug_expr (rtx expr)
>   	span = NULL;
>   
>         if (!span)
> -	queue_reg_save (src, NULL_RTX, offset);
> +	{
> +	  if (fde->rule18)
> +	    /* Just verify the hard frame pointer save when doing dynamic
> +	       realignment uses expected offset.  The actual queue_reg_save
> +	       needs to be deferred until the instruction that sets
> +	       hard frame pointer to stack pointer, see PR99334 for
> +	       details.  */
> +	    gcc_assert (known_eq (offset, 0));
> +	  else
> +	    queue_reg_save (src, NULL_RTX, offset);
> +	}
>         else
>   	{
>   	  /* We have a PARALLEL describing where the contents of SRC live.
> @@ -2732,6 +2753,7 @@ scan_trace (dw_trace_info *trace, bool e
>         create_trace_edges (control);
>       }
>   
> +  gcc_assert (!cfun->fde || !cfun->fde->rule18);
>     add_cfi_insn = NULL;
>     cur_row = NULL;
>     cur_trace = NULL;
> 
> 
> 	Jakub
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-26 19:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24  9:31 [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Jakub Jelinek
2021-03-25 21:43 ` Jason Merrill
2021-03-26 17:29   ` Jakub Jelinek
2021-03-26 19:52     ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).