From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by sourceware.org (Postfix) with ESMTPS id CB6383858010 for ; Wed, 14 Jul 2021 00:07:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB6383858010 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f52.google.com with SMTP id g16so883401wrw.5 for ; Tue, 13 Jul 2021 17:07:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=BILbyRedGCb+4JmLudvDlebuU1XUG8NXNshVaIpu190=; b=X/pLr5YilNe5OM0grWqIt5xmFQSJxDIlgFTnq9Azb1OFKZHSQtZ+RTQThwSc6dhh0/ YqSI6cXakLKW0gz8Zl168WvQ4G7NNi998RQV0QDc086SiF79shP3w98+BmJBEMXFigc4 y3D3EUipa1nWcwusJ1i4K1vuI3jDuar20GdF3daN0oB5IFVP9XNhe9n4nrhl5qkia+Wh YEsJ/dipUiuMlSBk0SusK+06jedaLfJqz3inj79fADQccibZw8IBg5tpzZ8alLvivJNr HWnI1WQycD0jcjwVKTeo+BUEo2f7KWybOkmsG4nYKac2SaVT7XSNrJeJMfGNfwnqMAB3 FKFw== X-Gm-Message-State: AOAM531Ge9Dg/ot30WxOoCT9DfzxF70PZDJgfMNOjc8VwizryMa7ASAH hrsSuMZ7LRdh0wk+pMnUuwJ0MHNSK1ehtg== X-Google-Smtp-Source: ABdhPJzrQLYnszVu4tvnUL/tjxKwSpTINzcl8vgxTWjd6xizImgDc+5tIRrf1HM/EirUCpCg9hKaAg== X-Received: by 2002:adf:d0d0:: with SMTP id z16mr9209332wrh.29.1626221225097; Tue, 13 Jul 2021 17:07:05 -0700 (PDT) Received: from ?IPv6:2001:8a0:f932:6a00:46bc:d03b:7b3a:2227? ([2001:8a0:f932:6a00:46bc:d03b:7b3a:2227]) by smtp.gmail.com with ESMTPSA id m15sm432781wmc.20.2021.07.13.17.07.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Jul 2021 17:07:04 -0700 (PDT) Subject: Re: [PATCH 2/2] glibc-2.34: Fix internal error when running gdb.base/gdb-sigterm.exp From: Pedro Alves To: Kevin Buettner Cc: gdb-patches@sourceware.org References: <20210710025129.201884-1-kevinb@redhat.com> <20210710025129.201884-3-kevinb@redhat.com> <20210713163521.7f58013e@f33-m1.lan> Message-ID: Date: Wed, 14 Jul 2021 01:07:03 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210713163521.7f58013e@f33-m1.lan> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, WEIRD_PORT autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2021 00:07:08 -0000 On 2021-07-14 12:35 a.m., Kevin Buettner wrote: > Pedro Alves wrote: >> This isn't right. The real bug is that inferior_ptid and current_thread_ got >> out of sync. Where did that happen? What set inferior_ptid without using >> switch_to_thread & friends? > > inferior_ptid is temporarily set without changing the current thread in > ps_xfer_memory in proc-service.c: > > static ps_err_e > ps_xfer_memory (const struct ps_prochandle *ph, psaddr_t addr, > gdb_byte *buf, size_t len, int write) > { > scoped_restore_current_inferior restore_inferior; > set_current_inferior (ph->thread->inf); > > scoped_restore_current_program_space restore_current_progspace; > set_current_program_space (ph->thread->inf->pspace); > > scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid); > inferior_ptid = ph->thread->ptid; > > CORE_ADDR core_addr = ps_addr_to_core_addr (addr); > > int ret; > if (write) > ret = target_write_memory (core_addr, buf, len); > else > ret = target_read_memory (core_addr, buf, len); > return (ret == 0 ? PS_OK : PS_ERR); > } > > Under normal circumstances, inferior_ptid is set and then either > target_read_memory() or target_write_memory() are called after > which it'll be reset to its former value when ps_xfer_memory > returns. > > However, should GDB receive a SIGTERM while this is happening, > the QUIT processing code in target_read() will be taken, leading > to an eventual call to inferior_thread where the assert occurs. > > Here's the relevant portion of the backtrace on my F34 w/ glibc-2.33.9000 > machine: > > #18 0x0000000000b6abf2 in internal_error (file=, > line=, fmt=) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdbsupport/errors.cc:55 > #19 0x00000000009b72e2 in inferior_thread () > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/thread.c:72 > #20 0x00000000009b8b2b in any_thread_of_inferior (inf=0x1603a60) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/thread.c:642 > #21 0x00000000009c45b8 in kill_or_detach (inf=0x1603a60, from_tty=0) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/top.c:1677 > #22 0x00000000009c49e7 in quit_force (exit_arg=0x0, from_tty=0) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/top.c:1779 > #23 0x0000000000a37e57 in quit () > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/utils.c:629 > #24 0x0000000000a37eb3 in maybe_quit () > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/utils.c:653 > #25 0x000000000099d8ef in target_read (ops=0x108b9c0 , > object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7ffca2d41d90 "", > offset=140737354129392, len=8) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:2025 > #26 0x000000000099d2e6 in target_read_memory (memaddr=0x7ffff7ffdff0, > myaddr=0x7ffca2d41d90 "", len=8) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:1810 > #27 0x0000000000854fa0 in ps_xfer_memory (During symbol reading: const value length mismatch for 'sigall_set', got 128, expected 0 > ph=0x1b086d8, addr=0x7ffff7ffdff0, > buf=0x7ffca2d41d90 "", len=8, write=0) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/proc-service.c:90 > #28 0x0000000000855160 in ps_pdread (ph=0x1b086d8, addr=0x7ffff7ffdff0, > buf=0x7ffca2d41d90, size=8) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/proc-service.c:124 > #29 0x00007fc1240151f0 in _td_fetch_value (ta=ta@entry=0x1b08740, > desc=desc@entry=0x1b087f0, descriptor_name=descriptor_name@entry=14, > idx=idx@entry=0x0, address=, > result=result@entry=0x7ffca2d41dc0) at fetch-value.c:115 > #30 0x00007fc124011bdd in td_ta_map_lwp2thr (ta_arg=0x1b08740, lwpid=1452077, > th=0x7ffca2d41f90) at td_ta_map_lwp2thr.c:194 > #31 0x00000000007ab627 in thread_from_lwp (stopped=0x18588b0, ptid=...) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/linux-thread-db.c:416 > #32 0x00000000007ada46 in thread_db_target::wait ( > this=0x108b9c0 , ptid=..., ourstatus=0x7ffca2d42578, > options=...) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/linux-thread-db.c:1431 > #33 0x000000000099e953 in target_wait (ptid=..., status=0x7ffca2d42578, > options=...) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/target.c:2608 > #34 0x0000000000756eaa in do_target_wait_1 (inf=0x1603a60, ptid=..., > status=0x7ffca2d42578, options=...) > at /ironwood1/sourceware-git/f34-2-glibc234/bld/../../worktree-glibc234/gdb/infrun.c:3663 Ah. This is sooo much clearer explained this way. > > The key points are as follows: > > 1) inferior_ptid and current_thread_ are set to null_ptid and nullptr > respectively via the call to switch_to_inferior_no_thread() in > do_target_wait_1(). > 2) Later on, while attempting to figure out the thread corresponding to an > LWP, ps_xfer_memory is called(). > 3) inferior_ptid is set (without also setting current_thread_) within > ps_xfer_memory(). Arrangements are made, however, to have > inferior_ptid's value restored upon return from ps_xfer_memory(). OK, this is one case where we can't use switch_to_thread, because we're in really lower level code, and ps_xfer_memory may be reached even without a thread, and also it's not a good idea to switch threads here because it flushes frame caches. > 4) Lower level memory xfer functions are invoked from ps_xfer_memory(). > 5) A SIGTERM is delivered to GDB; the signal handler is invoked causing > sync_quit_force_run to be set. > 6) The QUIT processing code in target_read_memory notices that sync_quit_force_run > is set. It calls quit() which calls quit_force() which calls kill_or_detach(). > 7) Mayhem (well, an assert) occurs due to inferior_ptid being something other than > null_ptid while current_thread_ is still nullptr. It looks to me that this sync_quit_force_run mechanism as is today is too dangerous. We're quite deep in the target implementation, with all sorts of state half switched around, we can't really assume that it's safe to reenter kill_or_detach (and target_kill/target_detach) like that. Off hand, it seems to me that the right fix is to let the normal quit exception propagate all the way to the top level, and then have the top level call quit_force if sync_quit_force_run is set.