From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by sourceware.org (Postfix) with ESMTPS id 380A73851C15 for ; Wed, 3 Jun 2020 13:09:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 380A73851C15 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 053D1wbi040716; Wed, 3 Jun 2020 13:09:18 GMT Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 31bewr183w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 03 Jun 2020 13:09:18 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 053D42Eq061181; Wed, 3 Jun 2020 13:09:17 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 31dju37jsp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 03 Jun 2020 13:09:17 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 053D9GFA012416; Wed, 3 Jun 2020 13:09:17 GMT Received: from [192.168.15.238] (/89.233.184.135) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 03 Jun 2020 06:09:16 -0700 Subject: Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list To: Pedro Alves , gdb@sourceware.org References: <5ab0b8b1-6072-6717-1ae0-ba06339254b8@oracle.com> <0570473c-1181-2269-06a0-0f6d4fc6b178@redhat.com> <51ff2398-4a7d-eb07-be98-0ae92673e152@oracle.com> <6f4b62a6-3bcc-346e-ac69-a89e98f6dfbe@redhat.com> <405d3ffb-ea46-57cb-a023-7dece1983fb6@oracle.com> <7fdd8e25-ccd3-00eb-ae30-b9f8c7604fd8@redhat.com> From: Petr Sumbera Organization: Oracle Corporation Message-ID: <1c434ddf-8ce8-8206-e046-66b8dafc4a0d@oracle.com> Date: Wed, 3 Jun 2020 15:09:14 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1 MIME-Version: 1.0 In-Reply-To: <7fdd8e25-ccd3-00eb-ae30-b9f8c7604fd8@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9640 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 mlxscore=0 adultscore=0 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006030103 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9640 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 bulkscore=0 phishscore=0 suspectscore=0 impostorscore=0 cotscore=-2147483648 lowpriorityscore=0 mlxscore=0 adultscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006030103 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jun 2020 13:09:23 -0000 On 02.06.2020 19:14, Pedro Alves wrote: > On 6/2/20 5:30 PM, Petr Sumbera wrote: > >> I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'): >> >> --- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000 >> +++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000 >> @@ -2207,9 +2207,10 @@ >>                     if (print_thread_events) >>                       printf_unfiltered (_("[%s exited]\n"), >>                                          target_pid_to_str (retval).c_str ()); >> -                   delete_thread (find_thread_ptid (retval)); >> -                   status->kind = TARGET_WAITKIND_SPURIOUS; >> -                   return retval; >> +                   thread_info *thr = find_thread_ptid (retval); >> +                   if (thr) >> +                     delete_thread (thr); >> +                   goto wait_again; >>                   } >>                 else if (syscall_is_exit (pi, what)) >>                   { >> >> But this time exited message repeats forever: >> >> [LWP    24         exited] >> [LWP    24         exited] >> [LWP    24         exited] > > Sounds like the LWP is stuck with the status, or the status is > cached. We probably need to resume the process to move it out > of the syscall, I guess. There's this bit in the file, at > another spot we call goto wait_again: > > /* How to keep going without returning to wfi: */ > target_continue_no_signal (ptid); > goto wait_again; > > wfi == wait_for_inferior, the name of a function that used > to be pretty core in infrun.c. Nowadays handle_inferior_event > took the role. > > Try doing the same. Like: > > delete_thread (find_thread_ptid (this, retval)); > target_continue_no_signal (ptid); > goto wait_again; > > You may need to split the delete_thread/find_thread bits, or > you may not. I'm not sure. > > The TARGET_WAITKIND_SPURIOUS handling in infrun.c also > just calls resume(GDB_SIGNAL_0), so I _think_ this will work as > well as before. I have no idea how this was supposed to handle > the case of an LWP exiting while another one is single > stepping. Looks like we lose the original single-stepping > request. Maybe. Not sure. But doesn't look like we're > making things any worse. This time it looks very promising. This is gdb 9.2 patch: --- gdb-9.2/gdb/procfs.c +++ gdb-9.2/gdb/procfs.c @@ -2208,8 +2208,8 @@ printf_unfiltered (_("[%s exited]\n"), target_pid_to_str (retval).c_str ()); delete_thread (find_thread_ptid (retval)); - status->kind = TARGET_WAITKIND_SPURIOUS; - return retval; + target_continue_no_signal (ptid); + goto wait_again; } else if (syscall_is_exit (pi, what)) { This works for few test cases. And I actually started gdb tests to see if it makes any regression (but it might take some time to run it though). But in one particular case it returns following: .. [LWP 33 exited1] [LWP 31 exited1] [LWP 32 exited1] [LWP 28 exited1] [LWP 30 exited1] [LWP 2 exited1] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) It might be related... Thank you very much! Petr