From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id E2B7D3861030 for ; Wed, 24 Mar 2021 09:31:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E2B7D3861030 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-201-i0y1tVtsORK9GjqRtXt7Uw-1; Wed, 24 Mar 2021 05:31:41 -0400 X-MC-Unique: i0y1tVtsORK9GjqRtXt7Uw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 61BE1612A6; Wed, 24 Mar 2021 09:31:40 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-112-95.ams2.redhat.com [10.36.112.95]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C50DA5447C; Wed, 24 Mar 2021 09:31:36 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 12O9VXYm1154944 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 24 Mar 2021 10:31:34 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 12O9VWSN1154943; Wed, 24 Mar 2021 10:31:32 +0100 Date: Wed, 24 Mar 2021 10:31:32 +0100 From: Jakub Jelinek To: Richard Henderson , Jason Merrill Cc: gcc-patches@gcc.gnu.org, gdb@sourceware.org, Mark Wielaard Subject: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] Message-ID: <20210324093132.GE186063@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2021 09:31:45 -0000 Hi! On the testcase in the PR with -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign we have prologue: 0000000000000000 <_func_with_dwarf_issue_>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 9: 41 ff 72 f8 pushq -0x8(%r10) d: 55 push %rbp e: 48 89 e5 mov %rsp,%rbp 11: 41 57 push %r15 13: 41 56 push %r14 15: 41 55 push %r13 17: 41 54 push %r12 19: 41 52 push %r10 1b: 53 push %rbx 1c: 48 83 ec 20 sub $0x20,%rsp and emit 00000000 0000000000000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -8 Return address column: 16 Augmentation data: 1b DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_offset: r16 (rip) at cfa-8 DW_CFA_nop DW_CFA_nop 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 DW_CFA_advance_loc: 5 to 0000000000000005 DW_CFA_def_cfa: r10 (r10) ofs 0 DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 13 to 000000000000001b DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24) DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32) ... unwind info for that. The problem is when async signal (or stepping through in the debugger) stops after the pushq %rbp instruction and before movq %rsp, %rbp, the unwind info says that caller's %rbp is saved there at *%rbp, but that is not true, caller's %rbp is either still available in the %rbp register, or in *%rsp, only after executing the next instruction - movq %rsp, %rbp - the location for %rbp is correct. So, either we'd need to temporarily say: DW_CFA_advance_loc: 9 to 000000000000000e DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0) DW_CFA_advance_loc: 3 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b or to me it seems more compact to just say: DW_CFA_advance_loc: 12 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) DW_CFA_advance_loc: 10 to 000000000000001b The following patch implements the latter. It keeps the call/jump and flush queue handling as is and only moves the clobbers_queued_reg_save forced flushes from before the instruction to immediately after the first instruction. My reading of libgcc unwinder is that while for calls it executes cfi instruction until < pc (but pc is the end of call anyway), for async signal stops it executes cfi instruction until <= pc, so it will only omit cfi instructions starting with advance to a higher loc. On the testcase from the PR, the difference with the patch is: @@ -16,9 +16,9 @@ Contents of the .eh_frame section: 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 DW_CFA_advance_loc: 5 to 0000000000000005 DW_CFA_def_cfa: r10 (r10) ofs 0 - DW_CFA_advance_loc: 9 to 000000000000000e + DW_CFA_advance_loc: 12 to 0000000000000011 DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) - DW_CFA_advance_loc: 13 to 000000000000001b + DW_CFA_advance_loc: 10 to 000000000000001b DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) When looking at other changes, e.g. on cc1plus itself when it has identical objdump -dr, the without patch to with patch difference is e.g.: 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d DW_CFA_advance_loc: 1 to 0000000000e2fc81 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r6 (rbp) at cfa-16 DW_CFA_advance_loc: 3 to 0000000000e2fc84 DW_CFA_def_cfa_register: r6 (rbp) - DW_CFA_advance_loc: 6 to 0000000000e2fc8a + DW_CFA_advance_loc: 9 to 0000000000e2fc8d DW_CFA_offset: r15 (r15) at cfa-24 DW_CFA_offset: r14 (r14) at cfa-32 DW_CFA_offset: r13 (r13) at cfa-40 - DW_CFA_advance_loc: 6 to 0000000000e2fc90 + DW_CFA_advance_loc: 5 to 0000000000e2fc92 DW_CFA_offset: r12 (r12) at cfa-48 DW_CFA_offset: r3 (rbx) at cfa-56 - DW_CFA_advance_loc2: 399 to 0000000000e2fe1f + DW_CFA_advance_loc2: 397 to 0000000000e2fe1f DW_CFA_remember_state DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_advance_loc: 1 to 0000000000e2fe20 DW_CFA_restore_state DW_CFA_nop DW_CFA_nop for 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>: e2fc80: 55 push %rbp e2fc81: 48 89 e5 mov %rsp,%rbp e2fc84: 41 57 push %r15 e2fc86: 41 56 push %r14 e2fc88: 41 55 push %r13 e2fc8a: 45 31 ed xor %r13d,%r13d e2fc8d: 41 54 push %r12 e2fc8f: 53 push %rbx e2fc90: 31 db xor %ebx,%ebx e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp) ... prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40 starts applying before or after the xorl %r13d,%r13d instruction and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying before or after the xorl %ebx,%ebx instruction. Or: -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af DW_CFA_advance_loc: 1 to 0000000000e3d161 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r6 (rbp) at cfa-16 DW_CFA_advance_loc: 9 to 0000000000e3d16a DW_CFA_def_cfa_register: r6 (rbp) - DW_CFA_advance_loc: 2 to 0000000000e3d16c + DW_CFA_advance_loc: 5 to 0000000000e3d16f DW_CFA_offset: r12 (r12) at cfa-24 - DW_CFA_advance_loc1: 66 to 0000000000e3d1ae + DW_CFA_advance_loc: 63 to 0000000000e3d1ae DW_CFA_def_cfa: r7 (rsp) ofs 8 - DW_CFA_nop - DW_CFA_nop - DW_CFA_nop for: 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>: e3d160: 55 push %rbp e3d161: 48 63 c2 movslq %edx,%rax e3d164: 49 89 f8 mov %rdi,%r8 e3d167: 48 89 e5 mov %rsp,%rbp e3d16a: 41 54 push %r12 e3d16c: 49 89 c4 mov %rax,%r12 e3d16f: 48 83 ec 08 sub $0x8,%rsp again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying before movq %rax,%r12 instruction or after it. I think if for some instructions it is possible that debugger or signal would stop in the middle of instruction, with some side-effects of the instruction already done and others not yet, then we can't do that, but at the start of instructions that modify the register the side-effects of the instruction should not have taken effect yet. Bootstrapped/regtested on x86_64-linux and i686-linux. Will e.g. GDB be happy about the changes? Thinking some more about this, what can be problematic on the GCC side is inline asm, that can and often will contain multiple instructions. For that is an easy fix, just test asm_noperands and handle clobbers_queued_reg_save before the insn rather than after in that case. But there is another problem, instruction patterns that emit multiple hw instructions, code can stop in between them. So, do we want some instruction attribute that would conservatively tell us which instruction patterns are guaranteed to be single machine instructions? 2021-03-24 Jakub Jelinek PR debug/99334 * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn) forced flushes of queued register saves for non-call/jump insns that can't throw, emit them first after the insn rather than before it. --- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100 +++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100 @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e dwarf2out_flush_queued_reg_saves (); } else if (!NONJUMP_INSN_P (insn) - || clobbers_queued_reg_save (insn) || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL)) dwarf2out_flush_queued_reg_saves (); any_cfis_emitted = false; add_cfi_insn = insn; + + if (queued_reg_saves.length () && clobbers_queued_reg_save (insn)) + dwarf2out_flush_queued_reg_saves (); + scan_insn_after (insn); control = insn; } Jakub