From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22057 invoked by alias); 13 Sep 2006 14:33:35 -0000 Received: (qmail 22049 invoked by uid 22791); 13 Sep 2006 14:33:34 -0000 X-Spam-Check-By: sourceware.org Received: from main.gmane.org (HELO ciao.gmane.org) (80.91.229.2) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 13 Sep 2006 14:33:27 +0000 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1GNVn1-0002Vz-Kq for cygwin@cygwin.com; Wed, 13 Sep 2006 16:32:37 +0200 Received: from eblake.csw.l-3com.com ([128.170.36.44]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Sep 2006 16:32:35 +0200 Received: from ebb9 by eblake.csw.l-3com.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Sep 2006 16:32:35 +0200 To: cygwin@cygwin.com From: Eric Blake Subject: Re: =?utf-8?b?YmFzaC0zLjEtNxskQiEhGyhCQlVH?= Date: Wed, 13 Sep 2006 14:33:00 -0000 Message-ID: References: <091320060438.11140.45078B490008FD8600002B8422007610640A050E040D0C079D0A@comcast.net> <20060913052510.GB1256@trixie.casa.cgf.cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit User-Agent: Loom/3.14 (http://gmane.org/) X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2006-09/txt/msg00240.txt.bz2 Christopher Faylor cygwin.com> writes: > Is bash assuming that it can read N characters and then subtract M > characters from the current position to get back to the beginning of a > line? If so, hmm. I guess this explains why it was reading a byte at a > time before. It must be counting characters rather than calling lseek > to figure out where it is. Yes, indeed, and it seems like reasonable semantics to expect as well (nevermind that it means that text mode on a seekable file involves a lot more processing, to consistently present the user with character count instead of byte offset). When a file is seekable, bash reads a buffer at a time for speed, but then must reseek to the offset where it last processed input before invoking any subprocesses, since POSIX requires that seekable files be left in a consistent state when swapping between multiple handles to the same underlying file description (even if the multiple handles exist in separate processes). When using stdio (such as fread and fseek), this works due to code in newlib (see __SCLE in stdio.h). But bash uses low-level Unix I/O, and does not benefit from newlib's approach. In a binary mount, seeking backwards by the character offset from where bash has processed to the end of the buffer it has read just works. It is only in a text mount where having lseek report the binary offset within the file, rather than the character offset, is causing problems. So I will probably end up reinstating a form of the previous #ifdef __CYGWIN__ check for is_seekable in bash 3.1-8 to chek whether a file is in text mode, in which case it is non-seekable; that is certainly a faster solution than waiting for cygwin to make a change for lseek on a text file to consistently use a character offset. But I intend that on binary files, \r\n line endings will treat the \r as part of the line, so at least binary mounts won't suffer from the speed impact of treating a file as unseekable the way bash 3.1-6 does. -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/