From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2850 invoked by alias); 13 Sep 2005 15:54:46 -0000 Mailing-List: contact java-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: java-patches-owner@gcc.gnu.org Received: (qmail 2825 invoked by uid 22791); 13 Sep 2005 15:54:41 -0000 Received: from adsl-67-116-42-147.dsl.sntc01.pacbell.net (HELO avtrex.com) (67.116.42.147) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Tue, 13 Sep 2005 15:54:41 +0000 Received: from [192.168.7.26] ([192.168.7.3]) by avtrex.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 13 Sep 2005 08:54:36 -0700 Message-ID: <4326F63C.7040607@avtrex.com> Date: Tue, 13 Sep 2005 15:54:00 -0000 From: David Daney User-Agent: Mozilla Thunderbird 1.0.6-1.1.fc3 (X11/20050720) MIME-Version: 1.0 To: Chris Burdess CC: classpath-patches@gnu.org, Java Patch List Subject: Re: [cp-patches] [Patch] Improve/fix gnu.java.net.LineInputStream... References: <4325D839.80500@avtrex.com> <82afe63528c3f2aa3c1c7dacb0f94bad@bluezoo.org> In-Reply-To: <82afe63528c3f2aa3c1c7dacb0f94bad@bluezoo.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2005-q3/txt/msg00350.txt.bz2 Chris Burdess wrote: > David Daney wrote: > >> gnu.java.net.LineInputStream has at least one bug in it, but think its >> whole approach is incorrect. >> >> First the bug: >> >> len = in.available(); >> len = (len < MIN_LENGTH) ? MIN_LENGTH : len; >> >> I think the idea was to read all available bytes into its buffer (but >> not more that the size of the buffer). However the conditional was >> reversed leading it to try to read more than is available. This can >> cause LineInputStream to block even if enough data is available to >> form an entire line. >> >> I have not fully researched it but this was causing HTTP connections >> to block for several seconds. I think under the proper circumstances >> they could block forever, but I am not positive about this. > > > If that's the case, there is a problem with the underlying stream. len > is simply initialised to a reasonable value: either 1024 bytes, or more > if the underlying input stream states that more can be read. The > underlying stream is not required to read len bytes. It should read as > many bytes as it can, and return the number of bytes read. The minimum > value is to prevent LineInputStream trying to allocate a buffer of 0 or > -1 bytes when the underlying stream doesn't know how many bytes are > available. We can reduce this minimum value to 1 if it's causing problems. O.K. I grant you that in theory readLine in its current state would not block IFF the underlying stream never blocks on reads when data is available. > >> The main problem I have with LineInputStream, is that in all cases I >> know about it is either reading from a raw socket or a >> BufferedInputStream. >> >> In the raw socket case LineInputStream reads one character at a time >> (as is required). If the stream supports mark/reset then >> LineInputStream reads big blocks of data and then resets if it finds >> that it read too far. >> >> My problem with this is that if the stream supports mark/reset, then >> it is already buffered and additional buffering is unlikely to produce >> any additional benefit. An as an added bad point you are using more >> memory for the redundant buffer. > > > What additional buffering? The line buffer? It's hardly redundant, since > you're not storing the same bytes in two places and we're freeing up the > underlying stream's buffer to read more bytes. If the line is longer > than the underlying stream's buffer, or the underlying stream is not > buffered, it's necessary. > There are two cases. 1) Three buffers: 1a) The buffer in the underlying stream that allows it to be reset. 1b) The buffer (called 'b' in the code) that is read into. 1c) The line buffer (called 'buf' in the code). 2) One buffer : The line buffer. One could make the argument that you could put a LineInputStream on top of a non-buffered resettable InputStream (perhaps a FileInputStream), but in practice it is never used in that manner so we don't consider this hypothetical case. Now given that the buffer '1a' exists, buffer '1b' is redundant. The current code allocates a temporary buffer '1b' with size max(1024, in.available()). If in.available() is large '1b' will be large also. I am not suggesting changing things just for my own amusement, I am experiencing real problems with the existing code. >> I did take the liberty of adding my own micro-optimization, in that if >> the encoding is US-ASCII, we can skip using String's character >> encoding system and just request hibyte of 0. I did this because a >> year ago with libgcj-3.4.3 we were seeing a vast increase in speed >> doing this in a different situation. > > > This micro-optimisation should be applied to > ByteArrayOutputStream.toString, not here. > OK, I will do that given that I have to patch ByteArrayOutputStream as well. >> >> Opinions? > > > Really bad idea. Fine, but lets consider this statement: > You're removing block reading capabilities, I am not removing block reading capabilities. They remain in the underlying BufferedInputStream that LineInputStream is attached to. All I am removing is redundant buffering and related memory allocations. > for what? Reduced resource usage (memory and CPU) in the java runtime. David Daney