From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by sourceware.org (Postfix) with ESMTPS id B95FF39D203E for ; Tue, 1 Jun 2021 14:09:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B95FF39D203E Received: by mail-qk1-x72c.google.com with SMTP id q10so14328550qkc.5 for ; Tue, 01 Jun 2021 07:09:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=U6+LmNGq/LPkP0ctOYnufGeckAe8F5yQov5HJ3uugVY=; b=R09/19rdrUPTHHOteec6qQ4BzljfnCPEn4rf5Rm1L8rwJZjAkLn/DIEDg32tMmmH8c oclpFBroJSj1sKlL4a/kfVJHn009pjUdlDnEsnZYozPpPl00mgsOYp1rZThNGA8pZDsS HfnBQ9iX73hj4m1fv7wQ8J6KjLzrTP7CIKIVeEz0hGRpQ7gtHmJcBuZGTTRLZRuf0Fxw yvniB6Ytnm9kCEwMMWjlspQpJaLuFL/YYaygE0xwDCU5DmLWgqz/GH3/Bkuyljnk6f/8 sMYlS43AA5b76hZSbv2rVFmswLp2kqFNjEs6W6prWIk2UHxUi2DkhwbA2FBsqQI582rK OChA== X-Gm-Message-State: AOAM533OolQEivhvi0KO0F4SKIjTgR18D+OBbk9r/nbjmG2+sIzwpuGA zPpxlqFLgzEDE3vv9bD/CPy6cA== X-Google-Smtp-Source: ABdhPJznwIMZlC2uPVKbqgahuSAdmlXmzIFTG2GUCJiOg7tZJJOrWKYWmwi5shv4gIhLGtwSkThXlg== X-Received: by 2002:a37:8403:: with SMTP id g3mr22251250qkd.469.1622556595184; Tue, 01 Jun 2021 07:09:55 -0700 (PDT) Received: from [192.168.1.4] ([177.194.59.218]) by smtp.gmail.com with ESMTPSA id d1sm10563937qti.72.2021.06.01.07.09.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 01 Jun 2021 07:09:54 -0700 (PDT) Subject: Re: RFC: pthread pid accessor (BZ# 27880) To: Florian Weimer Cc: Libc-alpha , hpa@zytor.com References: <6d79a213-0df2-be8e-3596-e010f366a34f@linaro.org> <87zgwa979l.fsf@oldenburg.str.redhat.com> From: Adhemerval Zanella Message-ID: Date: Tue, 1 Jun 2021 11:09:51 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <87zgwa979l.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jun 2021 14:09:57 -0000 On 31/05/2021 13:51, Florian Weimer wrote: > * Adhemerval Zanella: > >> It seems that this is trickier than it seems, some issues we might consider >> first: >> >> 1. What should we do with detached threads? As for pthread_kill, issuing >> a pthread_gettid_np might use an invalid handler (since the pthread_t >> identifier might be reused). This only solution I have is to define >> it as undefined behavior, this is not great but to proper support it >> would incur to keep tracking or all possible pthread_t identifiers >> (we already keep the glibc provided stacks, dl_stack_cache, so it >> would be a matter to include the user provided one in the list as >> special entries). > > Detached threads are fine as long as the thread is still running. This > is something the application can ensure using synchronization. > > There are other interfaces with this property, including pthread_kill. Afaik pthread_kill detaches created threads or thread that call pthread_detach are not really defined (the thread ID lifetime ends when detached is issued). We even have a bug report for this, BZ #19193. But currently calling pthread_kill is already undefined: it accesses the internal tid file without any extra check. Even using the INVALID_NOT_TERMINATED_TD_P/INVALID_TD_P won't really improve thing, since might still access invalid memory if the thread cache was empty and the resulted 'struct thread' was deallocated. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19193 > >> 2. I think that once we provide this API, developers will start to use to >> query if a thread is alive and I am not sure if this is really the >> proper API for this. This is the same issue as 1. > > They probably use pthread_kill with a zero signal for that today. > Here's an example for httpd: > > | /* deal with a rare timing window which affects waking up the > | * listener thread... if the signal sent to the listener thread > | * is delivered between the time it verifies that the > | * listener_may_exit flag is clear and the time it enters a > | * blocking syscall, the signal didn't do any good... work around > | * that by sleeping briefly and sending it again > | */ > | > | iter = 0; > | while (iter < 10 && > | #ifdef HAVE_PTHREAD_KILL > | pthread_kill(*listener_os_thread, 0) > | #else > | kill(ap_my_pid, 0) > | #endif > | == 0) { > | /* listener not dead yet */ Right, I thing the newer interface might work for non detached or threads that are not yet joined. > >> 3. How do we handle the concurrent access between pthread_join and >> pthread_gettid_np? Once a pthread_join is issued, the pthread_t >> identifier might be reused and accessing it should be >> invalid. pthread_join first synchronizes using 'joinid' to avoid >> concurrent pthread_join and then wait the kernel signal on 'tid' >> that the thread has finished. The straightforward >> 'pthread_gettid_np' implementation would do a atomic load on tid, >> however it might read a transient value between pthread_join >> 'joinid' setup and the futex wait. I am not sure how to handle it >> correctly. > > The application must ensure through synchronization that the lifetime of > the thread handle has not ended yet. Concurrent calls with pthread_join > is fine as long as the thread has not exited yet (same as for > pthread_kill). > > The question is what we should do after thread exit, but with a joinable > thread. I think for that we should return the original TID the kernel > assigned (even though it could not be reused). That would strongly > discourage the unsafe probing behavior because the function cannot be > used to probe if the thread is still running. Do you mean between the thread cancel/exit and kernel reset the struct thread 'tid' field? The main problem is the thread might be detached between, that's why pthread_join synchronizes first using the 'joinid' field. But I think there is no much we can do it besides a simple atomic load on struct thread 'tid'. Trying to synchronize with 'joinid' won't really help, since we 'pthread_detach' can't fail (not with an intermittent error). We might try to use either a busy wait or a lock on pthread_deatch and pthread_join over 'joinid', but I don't think this really solves much without introducing potential other latency issues. Peter has suggested to return zero or -1 with ESRCH if the pthread is detached from its underlying kernel thread, but I think INVALID_NOT_TERMINATED_TD_P is not valid for detached threads since the struct thread ownership might be invalid at the time of the call. So I think we should just make it undefined behavior and not making any assumptions. > >> Also, MacOSX signature is: >> >> int pthread_gettid_np (pthread_t thread, uint64_t *thread_id) >> >> And it returns the current thread identification if THREAD is NULL, returns >> ESRCH for invalid handle (the 1. and 2. issue below), and also consults >> the kernel if the identifier can no be obtained. > > Macos calls the interface pthread_threadid_np, actually. It looks as if > it returns a truly unique number that isn't reused within the process or > system. A Linux TID wouldn't be like that, so I think we should call > the interface something else. Fair enough, bionic has pid_t pthread_gettid_np(pthread_t t) So I think we might be an option. It basically returns the underlying kernel process identifier, no extra guarantee as done by MacOSX implementation.