From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zimbra.cs.ucla.edu (zimbra.cs.ucla.edu [131.179.128.68]) by sourceware.org (Postfix) with ESMTPS id 1FD943847814 for ; Thu, 9 Jun 2022 20:09:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1FD943847814 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=cs.ucla.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.ucla.edu Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 68217160216; Thu, 9 Jun 2022 13:09:00 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id hUq0GDh_T5OR; Thu, 9 Jun 2022 13:08:59 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7D907160217; Thu, 9 Jun 2022 13:08:59 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LkcSa1J7P-sm; Thu, 9 Jun 2022 13:08:59 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 54419160216; Thu, 9 Jun 2022 13:08:59 -0700 (PDT) Message-ID: Date: Thu, 9 Jun 2022 13:08:58 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: Florian Weimer Cc: libc-alpha@sourceware.org References: <871qxbxe2i.fsf@oldenburg.str.redhat.com> <577d0656-8b38-07d8-7b48-01870d3730c7@cs.ucla.edu> <87r13y5lth.fsf@oldenburg.str.redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Subject: Re: [PATCH] stdio-common: Add the fgetln function In-Reply-To: <87r13y5lth.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jun 2022 20:09:02 -0000 On 6/9/22 00:37, Florian Weimer wrote: > * Paul Eggert: > >> If the stream is not already oriented, FreeBSD getln sets the stream >> to byte-orientation. Should glibc getln do the same? > > Our getdelim doesn't do that explicitly. I raised the issue because one motivation for adding fgetln is to be compatible with FreeBSD. Although the orientation issue is secondary and can be detached from the main issue of adding fgetln, it might be helpful to address it while fgetln is being added (assuming it's added) rather than later. Perhaps we'll decide that neither fgetln nor getdelim should change orientation, i.e., we're deliberately incompatible with FreeBSD. That's OK too. >> On 5/3/22 00:36, Florian Weimer via Libc-alpha wrote: >>> + /* Discard the old buffer. This optimizes for a buffered stream, >>> + with multiple lines in each buffer. */ >>> + if (fp->_fgetln_buf != NULL) >>> + { >>> + free (fp->_fgetln_buf); >>> + fp->_fgetln_buf = NULL; >>> + } >> >> Hope you don't mind a bit of bikeshedding here.... >> >> Why free the fgetln buffer eagerly? Instead, free it only when >> closing. That would lessen pressure on the memory allocator and would >> save a few insns in fgetln's usual case. > > The assumption is that very few lines cross buffer boundaries. Yes, but the above test is run every time fgetln is called. It's faster to free only when closing, even if no lines cross buffer boundaries. The code must free when closing anyway, so this doesn't slow down closing. (But see below, as we may not need that buffer at all.) >> Come to think of it, how about if we restrict fgetln to streams for >> which either (1) the user has not called setvbuf with a nonnull >> buffer, or (2) the input line fits in the user-supplied setvbuf >> buffer. > > It would require a layering violation as far as libio is concerned: a > high-level function such as fgetln cannot reallocate the read buffer. > You mention setvbuf, but there are probably other cases (and of course > GCC 2.95 C++ classes, but we don't need to worry about compatibility > with those, I think). It's OK to violate layering if the performance improvement is worth it, which I would guess it would be for apps reading long lines. > I'm not sure if it's more efficient. The I/O block granularity would > change depending on where lines end. Can't we arrange for I/O blocking to be respected as the buffer grows? fgetln shouldn't need to stop reading the instant it sees a newline; it can read with the same blocksize it always does. My sense is that a one-buffer solution is more efficient than two buffers, where data are copied from one into the other. Of course I haven't measured this though.