From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout2.cs.wisc.edu (smtpout2.cs.wisc.edu [128.105.6.54]) by sourceware.org (Postfix) with ESMTPS id 6B8B63858404 for ; Sun, 10 Mar 2024 20:36:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6B8B63858404 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=cs.wisc.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.wisc.edu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6B8B63858404 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=128.105.6.54 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710103005; cv=none; b=a4yrVKQuNZczXB/lbSlWLb/ydLkfMCxDmeJzvg1B9iKQu2sOUSjyafrfCks+wq23Bt3x06O+wm+kDC8YWonjuxQOTu1UeqDt6hXT47vKvKkEwuhvK+yAXOCNac98BV9F4JgxqOfPcTwSH1IM/A2uK3D9qiN+Fk+YvWi5DSaS16g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710103005; c=relaxed/simple; bh=diIiSvuf1VRdNM6zij/+m1h2Ydp+hOKCBFr6rVP8d0o=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=xmyVQh7DjmEqCSIqN6Gq+Q2Wi+Z2rLxAH4KN6kgjTJXGJZn8QrqPcUc9jiETZk1mPFFAEPolGsKp5IItg31aceCn8OBtBdVl0hSr4z4WwVzx1dijktaTon7mi4iHRGTgV8mfJqnWC281PJU8si2q83d30kXAThA6JVeMzhHSfW8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from alumni.cs.wisc.edu (alumni.cs.wisc.edu [128.105.2.11]) by flint.cs.wisc.edu (8.14.7/8.14.4) with ESMTP id 42AKaWBn002932; Sun, 10 Mar 2024 15:36:32 -0500 DKIM-Filter: OpenDKIM Filter v2.11.0 flint.cs.wisc.edu 42AKaWBn002932 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.wisc.edu; s=csl-2018021300; t=1710102993; bh=AvCRhxFQcFcI8EkLJsoIxE4v0xZjVdHIYbfRKrM90xk=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=I3wdY+UjX8U049Sk3462W6DjIwZOYspCVzsGXrH8kJ6Jc9Kcth+nlC/MGipQlyKki JiMgj05Kclg9nyQWWYM1VFU/AxtT2bxgjwDlTENdX5R8hs0vkLrAaKDqy0O2y+cY2a lKr/75Jbii2+8muDnoWeHCWgeQ6UBMH5/Fa8ozKvOc8cLjMRT3+4VbB7TJBKuZLci/ PbshFm3pinth7LEKoaiRnQ624eQUBbgrzM8Mos1f3oJKAcxbORUxFJzupIojRcXTzY LjRpliKKyuhYj8PQ4FEwnft4DKVCyVFGyNG1O3DYT9qHfytb2e5Dq6zCZ/2nWDpwM+ w+seUunYGffdw== Received: by alumni.cs.wisc.edu (Postfix, from userid 23719) id 49D1E1E07F8; Sun, 10 Mar 2024 15:36:32 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by alumni.cs.wisc.edu (Postfix) with ESMTP id 442E21E0718; Sun, 10 Mar 2024 15:36:32 -0500 (CDT) Date: Sun, 10 Mar 2024 15:36:32 -0500 (CDT) From: Carl Edquist To: Zachary Santer cc: libc-alpha@sourceware.org, coreutils@gnu.org, p@draigbrady.com Subject: Re: RFE: enable buffering on null-terminated data In-Reply-To: Message-ID: <317fe0e2-8cf9-d4ac-ed56-e6ebcc2baa55@cs.wisc.edu> References: <9831afe6-958a-fbd3-9434-05dd0c9b602a@draigBrady.com> MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="1769999106-1458580817-1710096442=:2721864" Content-ID: <4f6878b5-bb1a-de35-41e7-1ce385d18ea7@cs.wisc.edu> X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --1769999106-1458580817-1710096442=:2721864 Content-Type: text/plain; CHARSET=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT Content-ID: Hi Zack, This sounds like a potentially useful feature (it'd probably belong with a corresponding new buffer mode in setbuf(3)) ... > Filenames should be passed between utilities in a null-terminated > fashion, because the null byte is the only byte that can't appear within > one. Out of curiosity, do you have an example command line for your use case? > If I want to buffer output data on null bytes, the closest I can get is > 'stdbuf --output=0', which doesn't buffer at all. This is pretty > inefficient. I'm just thinking that find(1), for instance, will end up calling write(2) exactly once per filename (-print or -print0) if run under stdbuf unbuffered, which is the same as you'd get with a corresponding stdbuf line-buffered mode (newline or null-terminated). It seems that where line buffering improves performance over unbuffered is when there are several calls to (for example) printf(3) in constructing a single line. find(1), and some filters like grep(1), will write a line at a time in unbuffered mode, and thus don't seem to benefit at all from line buffering. On the other hand, cut(1) appears to putchar(3) a byte at a time, which in unbuffered mode will (like you say) be pretty inefficient. So, depending on your use case, a new null-terminated line buffered option may or may not actually improve efficiency over unbuffered mode. You can run your commands under strace like stdbuf --output=X strace -c -ewrite command ... | ... to count the number of actual writes for each buffering mode. Carl PS, "find -printf" recognizes a '\c' escape to flush the output, in case that helps. So "find -printf '%p\0\c'" would, for instance, already behave the same as "stdbuf --output=N find -print0" with the new stdbuf output mode you're suggesting. (Though again, this doesn't actually seem to be any more efficient than running "stdbuf --output=0 find -print0") On Sun, 10 Mar 2024, Zachary Santer wrote: > Was "stdbuf feature request - line buffering but for null-terminated data" > > See below. > > On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady wrote: >> >> On 09/03/2024 16:30, Zachary Santer wrote: >>> 'stdbuf --output=L' will line-buffer the command's output stream. >>> Pretty useful, but that's looking for newlines. Filenames should be >>> passed between utilities in a null-terminated fashion, because the >>> null byte is the only byte that can't appear within one. >>> >>> If I want to buffer output data on null bytes, the closest I can get >>> is 'stdbuf --output=0', which doesn't buffer at all. This is pretty >>> inefficient. >>> >>> 0 means unbuffered, and Z is already taken for, I guess, zebibytes. >>> --output=N, then? >>> >>> Would this require a change to libc implementations, or is it possible now? >> >> This does seem like useful functionality, >> but it would require support for libc implementations first. >> >> cheers, >> Pádraig > > --1769999106-1458580817-1710096442=:2721864--