From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail231.csoft.net (mail231.csoft.net [66.216.5.135]) by sourceware.org (Postfix) with ESMTPS id 322B43842FCF for ; Wed, 8 May 2024 18:54:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 322B43842FCF Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=jdrake.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jdrake.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 322B43842FCF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=66.216.5.135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715194468; cv=none; b=e4P65HoiKyGdJWjCzZ3V9xjiHuqkZDCgy4t0Xwn41ZDPKZSTRwQ4lV5atJ34+HFtJJ7LoO+Y07XktZ/EIPddDWQ/7Tkitcfm3mXtmiwvJ7grCqR9pz09pHuliunrBcu/C819yyIQO59sdumYlwk69kZYM2HLZMOVGFYdT4P11mk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715194468; c=relaxed/simple; bh=/1b3DD2EXhlZaY9DAxuUJfUO22pxkAM2hdzQDHBB4xU=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=i/9xaNBuxfZogVkheiirDxa0V0K2X0wg1X0hd1rY16ptbNRpcqgF7ytwHeuyuXHRmmEJMsM49CB1jeqrFNnTcRNiZvRQy2gymlC8T167J67j5a2L2t+oNU6Gf6wud7u4c3MxeNzUH68wFglmTxQnnvZ8R6WctvmG+62EbvKhHBU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail231.csoft.net (localhost [127.0.0.1]) by mail231.csoft.net (Postfix) with ESMTP id 37CAC45B7C; Wed, 8 May 2024 14:54:23 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jdrake.com; h=date:from:to :cc:subject:message-id:mime-version:content-type; s=csoft; bh=Zk 3vqiZEVF9jQWczHkvrOQfNfjg=; b=ugFI0v5m9LLUr0+JyitmgOeDBQB3SRyvSy vDjxZdND0BWkYWrzYRosQ71ODnzZvZnjs0d6KE8YFOwomEZZeWcI48t3XeucQeXh 5LHgUGShrKSGlnbDFz3TFniqLUM0iqt0zC1qime2fwtBD4izGMYTajwwQbVBSG0w 1L8I0Qrqs= Received: from mail231 (mail231 [66.216.5.135]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA512) (No client certificate requested) (Authenticated sender: jeremyd) by mail231.csoft.net (Postfix) with ESMTPSA id 335D845B7B; Wed, 8 May 2024 14:54:23 -0400 (EDT) Date: Wed, 8 May 2024 11:54:21 -0700 (PDT) From: Jeremy Drake X-X-Sender: jeremyd@resin.csoft.net To: cygwin-developers@cygwin.com cc: Johannes.Schindelin@gmx.de Subject: double-fork issue on Windows on ARM64 Message-ID: <78f294de-4c94-242a-722e-fd98e51edff9@jdrake.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-1190799409-1715194463=:59552" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_NUMSUBJECT,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1190799409-1715194463=:59552 Content-Type: text/plain; charset=US-ASCII (this is the same issue discussed in https://cygwin.com/pipermail/cygwin-patches/2024q1/012621.html) On MSYS2, running on Windows on ARM64 only, we've been plagued by issues with processes hanging up. Usually pacman, when it is trying to validate signatures with gpgme. When a process is hung in this way, no debugger seems to be able to attach properly. After many months of off-and-on progress trying to debug this, we've *finally* got an idea of what behavior is causing this, and a standalone reproducer that runs on Cygwin. > A common symptom is that the hanging process has a command-line that is > identical to its parent process' command-line (indicating that it has > been fork()ed), and anecdotally, the hang occurs when _exit() calls > proc_terminate() which is then blocked by a call to TerminateThread() > with an invalid thread handle (for more details, see > https://github.com/msys2/msys2-autobuild/issues/62#issuecomment-1951796327). > > In my tests, I found that the hanging process is spawned from > _gpgme_io_spawn() which lets the child process immediately spawn another > child. That seems like a fantastic way to find timing-related bugs in > the MSYS2/Cygwin runtime. > > As a work-around, it does seem to help if we avoid that double-fork. That led me to make the attached reproducer, which is based on the code from _gpgme_io_spawn. I originally expected that this would require some timing adjustment, hence the defines to change the binary and argument (I expected to use /bin/sleep and different values). It turns out, this reproduces readily with /bin/true. I build this with `gcc -ggdb -o testfork testfork.c`, and this reproduces: * on a Raspberry PI 4 running Windows 10, with an i686 msys2 runtime * on a QC710 running Windows 11 23H2, with x86_64 msys2 runtime (this seems to reproduce it most readily). * on a hyper-v virtual machine on Dev Kit 2023 running Windows 11 23H2, with x86_64 msys2 runtime or Cygwin 3.5.3. This seems to require running two instances of testfork.exe at the same time. When attaching to the hung process, gdb shows (gdb) i thr Id Target Id Frame 1 Thread 6516.0xbe8 error return /cygdrive/d/a/scallywag/gdb/gdb-13.2-1.x86_64/src/gdb-13.2/gdb/windows-nat.c:748 was 31: A device attached to the system is not functioning. 0x0000000000000000 in ?? () 2 Thread 6516.0x1b28 "sig" 0x00007ff8051a8a64 in ?? () * 3 Thread 6516.0x12b4 0x00007ff8051b4374 in ?? () Let me know if I can provide any additional info, or anything else we can try to help debug this. --0-1190799409-1715194463=:59552 Content-Type: text/x-c; name=testfork.c Content-Transfer-Encoding: BASE64 Content-ID: <3ad86fa3-ec30-90b5-63d1-beae2d8a0947@jdrake.com> Content-Description: Content-Disposition: attachment; filename=testfork.c I2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3lzL3dhaXQuaD4NCiNp bmNsdWRlIDx1bmlzdGQuaD4NCg0KI2lmbmRlZiBCSU5BUlkNCiNkZWZpbmUg QklOQVJZICIvYmluL3RydWUiDQojZW5kaWYNCg0KI2lmbmRlZiBBUkcNCiNk ZWZpbmUgQVJHICIwLjEiDQojZW5kaWYNCg0KaW50IG1haW4oaW50IGFyZ2Ms IGNoYXIgKiogYXJndikNCnsNCgl3aGlsZSAoMSkNCgl7DQoJCWludCBwaWQ7 DQoJCXByaW50ZigiU3RhcnRpbmcgZ3JvdXAgb2YgMTAweCAiIEJJTkFSWSAi ICIgQVJHICJcbiIpOw0KCQlmb3IgKGludCBpID0gMDsgaSA8IDEwMDsgKytp KQ0KCQl7DQoJCQlwaWQgPSBmb3JrKCk7DQoJCQlpZiAocGlkID09IC0xKQ0K CQkJew0KCQkJCXBlcnJvcigiZm9yayBlcnJvciIpOw0KCQkJCXJldHVybiAx Ow0KCQkJfQ0KCQkJZWxzZSBpZiAocGlkID09IDApDQoJCQl7DQoJCQkJaWYg KChwaWQgPSBmb3JrKCkpID09IDApDQoJCQkJew0KCQkJCQljaGFyICogY29u c3QgYXJnc1tdID0ge0JJTkFSWSwgQVJHLCBOVUxMfTsNCgkJCQkJZXhlY3Yo QklOQVJZLCBhcmdzKTsNCgkJCQkJcGVycm9yKCJleGVjdiBmYWlsZWQiKTsN CgkJCQkJX2V4aXQoNSk7DQoJCQkJfQ0KCQkJCWlmIChwaWQgPT0gLTEpDQoJ CQkJew0KCQkJCQlwZXJyb3IoImlubmVyIGZvcmsgZXJyb3IiKTsNCgkJCQkJ X2V4aXQoMSk7DQoJCQkJfQ0KCQkJCWVsc2UNCgkJCQl7DQoJCQkJCV9leGl0 KDApOw0KCQkJCX0NCgkJCX0NCgkJCWVsc2UNCgkJCXsNCgkJCQlpbnQgc3Rh dHVzOw0KCQkJCWlmICh3YWl0cGlkKHBpZCwgJnN0YXR1cywgMCkgPT0gLTEp DQoJCQkJew0KCQkJCQlwZXJyb3IoIndhaXRwaWQgZXJyb3IiKTsNCgkJCQkJ cmV0dXJuIDI7DQoJCQkJfQ0KCQkJCWVsc2UgaWYgKHN0YXR1cyAhPSAwKQ0K CQkJCXsNCgkJCQkJZnByaW50ZihzdGRlcnIsICJzdWJwcm9jZXNzIGV4aXRl ZCBub24temVybzogJWRcbiIsIHN0YXR1cyk7DQoJCQkJCXJldHVybiBXRVhJ VFNUQVRVUyhzdGF0dXMpOw0KCQkJCX0NCgkJCX0NCgkJfQ0KCX0NCglyZXR1 cm4gMDsNCn0NCg== --0-1190799409-1715194463=:59552--