From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 21EB73858001 for ; Thu, 25 Mar 2021 21:44:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 21EB73858001 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-19-O6kLFNqNO4GcqZTaP6YKvg-1; Thu, 25 Mar 2021 17:43:59 -0400 X-MC-Unique: O6kLFNqNO4GcqZTaP6YKvg-1 Received: by mail-qv1-f71.google.com with SMTP id da16so4488156qvb.2 for ; Thu, 25 Mar 2021 14:43:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=CFMIhUnQzLf7MupzoLS0lKQRJOoaBRCqzjhXvBd87R8=; b=PXh9hhlsrmsIp/UbdtJEHkFUtuMCxhe18Z8xJ+s48PbDcOu9n/vZ+NBF6NZA6rjD1H X5/aObFp9L8pARLvxdiBLcbex53kyUGXBOFOqzr8i+NsvWgIL71SHJy+3jr8NvVxToMK tbBgAt5gozkvGbZvK1yY8A2lI7nsfm0wt6qrMgGTfgWXiS7TfDPWdgP5kZRdktLK0I7m mvSufZ7dj0zNv416Lhl5ITwcOV9mvdMzJA6vawSNuyVxE4jfa1HIedHWEmjE5/jQQqxz eRVkSEsu6Ud694JlvHNMhIJVJBS4S0NlItCtD87qk9QEca1BTbU40xmFtoggi/EHYpsF 4pHw== X-Gm-Message-State: AOAM532fNuqzU997/KmQQnlV+Vyq5ylIAxPowfYzzCBmlAY/+DR49m79 S0/uLHggmWseFCEnjtJQffMTxAx43aRBgkMcamb3X37JnhnHyXtWGJ5rfGWjp7xUUnYYow2SA9c A8n7x+wsDaKw= X-Received: by 2002:a37:4553:: with SMTP id s80mr10175712qka.167.1616708639083; Thu, 25 Mar 2021 14:43:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjq7lLhoGt1Xn7QiXJ2D0SUX6zADGYepIXothzj3FcVQ90dw62Ed7/Ij3piaU4DYph/C0pLw== X-Received: by 2002:a37:4553:: with SMTP id s80mr10175689qka.167.1616708638661; Thu, 25 Mar 2021 14:43:58 -0700 (PDT) Received: from [192.168.1.148] (209-6-216-142.s141.c3-0.smr-cbr1.sbo-smr.ma.cable.rcncustomer.com. [209.6.216.142]) by smtp.gmail.com with ESMTPSA id 33sm4589880qtf.78.2021.03.25.14.43.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Mar 2021 14:43:57 -0700 (PDT) Subject: Re: [PATCH] dwarf2cfi: Defer queued register saves some more [PR99334] To: Jakub Jelinek , Richard Henderson Cc: gcc-patches@gcc.gnu.org, gdb@sourceware.org, Mark Wielaard References: <20210324093132.GE186063@tucnak> From: Jason Merrill Message-ID: <0d60aeed-783d-1128-65e5-fc24c504bf20@redhat.com> Date: Thu, 25 Mar 2021 17:43:57 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210324093132.GE186063@tucnak> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2021 21:44:04 -0000 On 3/24/21 5:31 AM, Jakub Jelinek wrote: > Hi! > > On the testcase in the PR with > -fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign > we have prologue: > 0000000000000000 <_func_with_dwarf_issue_>: > 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 > 5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp > 9: 41 ff 72 f8 pushq -0x8(%r10) > d: 55 push %rbp > e: 48 89 e5 mov %rsp,%rbp > 11: 41 57 push %r15 > 13: 41 56 push %r14 > 15: 41 55 push %r13 > 17: 41 54 push %r12 > 19: 41 52 push %r10 > 1b: 53 push %rbx > 1c: 48 83 ec 20 sub $0x20,%rsp > and emit > 00000000 0000000000000014 00000000 CIE > Version: 1 > Augmentation: "zR" > Code alignment factor: 1 > Data alignment factor: -8 > Return address column: 16 > Augmentation data: 1b > DW_CFA_def_cfa: r7 (rsp) ofs 8 > DW_CFA_offset: r16 (rip) at cfa-8 > DW_CFA_nop > DW_CFA_nop > > 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 > DW_CFA_advance_loc: 5 to 0000000000000005 > DW_CFA_def_cfa: r10 (r10) ofs 0 > DW_CFA_advance_loc: 9 to 000000000000000e > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 13 to 000000000000001b > DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) > DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) > DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) > DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24) > DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32) > ... > unwind info for that. The problem is when async signal > (or stepping through in the debugger) stops after the pushq %rbp > instruction and before movq %rsp, %rbp, the unwind info says that > caller's %rbp is saved there at *%rbp, but that is not true, caller's > %rbp is either still available in the %rbp register, or in *%rsp, > only after executing the next instruction - movq %rsp, %rbp - the > location for %rbp is correct. So, either we'd need to temporarily > say: > DW_CFA_advance_loc: 9 to 000000000000000e > DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0) > DW_CFA_advance_loc: 3 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 10 to 000000000000001b > or to me it seems more compact to just say: > DW_CFA_advance_loc: 12 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > DW_CFA_advance_loc: 10 to 000000000000001b > > The following patch implements the latter. It keeps the call/jump and > flush queue handling as is and only moves the clobbers_queued_reg_save > forced flushes from before the instruction to immediately after the first > instruction. My reading of libgcc unwinder is that while for calls it > executes cfi instruction until < pc (but pc is the end of call anyway), > for async signal stops it executes cfi instruction until <= pc, so it > will only omit cfi instructions starting with advance to a higher loc. > On the testcase from the PR, the difference with the patch is: > @@ -16,9 +16,9 @@ Contents of the .eh_frame section: > 00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5 > DW_CFA_advance_loc: 5 to 0000000000000005 > DW_CFA_def_cfa: r10 (r10) ofs 0 > - DW_CFA_advance_loc: 9 to 000000000000000e > + DW_CFA_advance_loc: 12 to 0000000000000011 > DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0) > - DW_CFA_advance_loc: 13 to 000000000000001b > + DW_CFA_advance_loc: 10 to 000000000000001b > DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref) > DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8) > DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16) > When looking at other changes, e.g. on cc1plus itself when it has > identical objdump -dr, the without patch to with patch difference > is e.g.: > 0005f388 000000000000002c 0005f38c FDE cie=00000000 pc=0000000000e2fc80..0000000000e2fe8d > DW_CFA_advance_loc: 1 to 0000000000e2fc81 > DW_CFA_def_cfa_offset: 16 > DW_CFA_offset: r6 (rbp) at cfa-16 > DW_CFA_advance_loc: 3 to 0000000000e2fc84 > DW_CFA_def_cfa_register: r6 (rbp) > - DW_CFA_advance_loc: 6 to 0000000000e2fc8a > + DW_CFA_advance_loc: 9 to 0000000000e2fc8d > DW_CFA_offset: r15 (r15) at cfa-24 > DW_CFA_offset: r14 (r14) at cfa-32 > DW_CFA_offset: r13 (r13) at cfa-40 > - DW_CFA_advance_loc: 6 to 0000000000e2fc90 > + DW_CFA_advance_loc: 5 to 0000000000e2fc92 > DW_CFA_offset: r12 (r12) at cfa-48 > DW_CFA_offset: r3 (rbx) at cfa-56 > - DW_CFA_advance_loc2: 399 to 0000000000e2fe1f > + DW_CFA_advance_loc2: 397 to 0000000000e2fe1f > DW_CFA_remember_state > DW_CFA_def_cfa: r7 (rsp) ofs 8 > DW_CFA_advance_loc: 1 to 0000000000e2fe20 > DW_CFA_restore_state > DW_CFA_nop > DW_CFA_nop > for > 0000000000e2fc80 <_ZL13result_vectoriP7rtx_def>: > e2fc80: 55 push %rbp > e2fc81: 48 89 e5 mov %rsp,%rbp > e2fc84: 41 57 push %r15 > e2fc86: 41 56 push %r14 > e2fc88: 41 55 push %r13 > e2fc8a: 45 31 ed xor %r13d,%r13d > e2fc8d: 41 54 push %r12 > e2fc8f: 53 push %rbx > e2fc90: 31 db xor %ebx,%ebx > e2fc92: 48 81 ec 98 02 00 00 sub $0x298,%rsp > e2fc99: 89 7d c4 mov %edi,-0x3c(%rbp) > ... > prologue, i.e. the change is whether DW_CFA_offset: r13 (r13) at cfa-40 > starts applying before or after the xorl %r13d,%r13d instruction > and whether DW_CFA_offset: r3 (rbx) at cfa-56 starts applying > before or after the xorl %ebx,%ebx instruction. > Or: > -00061590 0000000000000020 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af > +00061590 000000000000001c 00061594 FDE cie=00000000 pc=0000000000e3d160..0000000000e3d1af > DW_CFA_advance_loc: 1 to 0000000000e3d161 > DW_CFA_def_cfa_offset: 16 > DW_CFA_offset: r6 (rbp) at cfa-16 > DW_CFA_advance_loc: 9 to 0000000000e3d16a > DW_CFA_def_cfa_register: r6 (rbp) > - DW_CFA_advance_loc: 2 to 0000000000e3d16c > + DW_CFA_advance_loc: 5 to 0000000000e3d16f > DW_CFA_offset: r12 (r12) at cfa-24 > - DW_CFA_advance_loc1: 66 to 0000000000e3d1ae > + DW_CFA_advance_loc: 63 to 0000000000e3d1ae > DW_CFA_def_cfa: r7 (rsp) ofs 8 > - DW_CFA_nop > - DW_CFA_nop > - DW_CFA_nop > for: > 0000000000e3d160 <_Z23builtin_memset_read_strPvl15scalar_int_mode>: > e3d160: 55 push %rbp > e3d161: 48 63 c2 movslq %edx,%rax > e3d164: 49 89 f8 mov %rdi,%r8 > e3d167: 48 89 e5 mov %rsp,%rbp > e3d16a: 41 54 push %r12 > e3d16c: 49 89 c4 mov %rax,%r12 > e3d16f: 48 83 ec 08 sub $0x8,%rsp > again, whether DW_CFA_offset: r12 (r12) at cfa-24 starts applying > before movq %rax,%r12 instruction or after it. > I think if for some instructions it is possible that debugger or signal > would stop in the middle of instruction, with some side-effects of the > instruction already done and others not yet, then we can't do that, > but at the start of instructions that modify the register the side-effects > of the instruction should not have taken effect yet. > > Bootstrapped/regtested on x86_64-linux and i686-linux. LGTM. This seems consistent with when we flush due to clobbers in dwarf2out_frame_debug. > Will e.g. GDB be happy about the changes? I would expect so, but it would be good to have someone from GDB verify. > Thinking some more about this, what can be problematic on the GCC side > is inline asm, that can and often will contain multiple instructions. > For that is an easy fix, just test asm_noperands and handle > clobbers_queued_reg_save before the insn rather than after in that case. Sure, but does inline asm go through dwarf2out_frame_debug? > But there is another problem, instruction patterns that emit multiple > hw instructions, code can stop in between them. So, do we want some > instruction attribute that would conservatively tell us which instruction > patterns are guaranteed to be single machine instructions? It seems to me that in that situation you'd want to add the save to *%rsp, and still not update to *%rbp until after the combined instruction. Incidentally, it seems a bit odd that clobbers_queued_reg_save returns true both for clobbering the saved register and for clobbering the save location; if I already know that %rbp is saved at *%rsp, why do I care that the insn we're looking at clobbers %rbp? Maybe we should only check modified_in_p (q->reg, insn) if q->reg doesn't appear in regs_saved_in_regs. That wouldn't help with this testcase, though. > 2021-03-24 Jakub Jelinek > > PR debug/99334 > * dwarf2cfi.c (scan_trace): For clobbers_queued_reg_save (insn) > forced flushes of queued register saves for non-call/jump insns > that can't throw, emit them first after the insn rather than > before it. > > --- gcc/dwarf2cfi.c.jj 2021-03-02 11:25:47.217727061 +0100 > +++ gcc/dwarf2cfi.c 2021-03-23 17:34:58.240281522 +0100 > @@ -2705,12 +2705,15 @@ scan_trace (dw_trace_info *trace, bool e > dwarf2out_flush_queued_reg_saves (); > } > else if (!NONJUMP_INSN_P (insn) > - || clobbers_queued_reg_save (insn) > || find_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL)) > dwarf2out_flush_queued_reg_saves (); > any_cfis_emitted = false; > > add_cfi_insn = insn; > + > + if (queued_reg_saves.length () && clobbers_queued_reg_save (insn)) > + dwarf2out_flush_queued_reg_saves (); > + > scan_insn_after (insn); > control = insn; > } > > > Jakub >