public inbox for java-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] Improve/fix gnu.java.net.LineInputStream...
@ 2005-09-12 19:34 David Daney
  2005-09-13  8:15 ` [cp-patches] " Chris Burdess
  0 siblings, 1 reply; 5+ messages in thread
From: David Daney @ 2005-09-12 19:34 UTC (permalink / raw)
  To: Java Patch List; +Cc: classpath-patches

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

gnu.java.net.LineInputStream has at least one bug in it, but think its 
whole approach is incorrect.

First the bug:

              len = in.available();
              len = (len < MIN_LENGTH) ? MIN_LENGTH : len;

I think the idea was to read all available bytes into its buffer (but 
not more that the size of the buffer).  However the conditional was 
reversed leading it to try to read more than is available.  This can 
cause LineInputStream to block even if enough data is available to form 
an entire line.

I have not fully researched it but this was causing HTTP connections to 
block for several seconds.  I think under the proper circumstances they 
could block forever, but I am not positive about this.

The main problem I have with LineInputStream, is that in all cases I 
know about it is either reading from a raw socket or a BufferedInputStream.

In the raw socket case LineInputStream reads one character at a time (as 
is required).  If the stream supports mark/reset then LineInputStream 
reads big blocks of data and then resets if it finds that it read too far.

My problem with this is that if the stream supports mark/reset, then it 
is already buffered and additional buffering is unlikely to produce any 
additional benefit.  An as an added bad point you are using more memory 
for the redundant buffer.

I did take the liberty of adding my own micro-optimization, in that if 
the encoding is US-ASCII, we can skip using String's character encoding 
system and just request hibyte of 0.  I did this because a year ago with 
  libgcj-3.4.3 we were seeing a vast increase in speed doing this in a 
different situation.

Note that this micro-optimization is dependent on my 
ByteArrayOutputStream patch.


Bootstrapped and tested with make -k check in libjava on i686-pc-linux-gnu.

2005-09-12  David Daney  <ddaney@avtrex.com>

	* classpath/gnu/java/net/LineInputStream.java (blockReads): Removed.
	(Constructor): Don't initialize blockReads.
	(bufToString): New method.
	(readLine): Removed block reading logic.
	(indexOf): Removed.

I suppose if this is OK that I would commit it to classpath (when my CVS 
access is enabled).

Opinions?


David Daney.

[-- Attachment #2: lis.d --]
[-- Type: text/plain, Size: 5189 bytes --]

Index: classpath/gnu/java/net/LineInputStream.java
===================================================================
RCS file: /cvs/gcc/gcc/libjava/classpath/gnu/java/net/LineInputStream.java,v
retrieving revision 1.1.1.1
diff -c -p -r1.1.1.1 LineInputStream.java
*** classpath/gnu/java/net/LineInputStream.java	16 Jul 2005 00:33:56 -0000	1.1.1.1
--- classpath/gnu/java/net/LineInputStream.java	12 Sep 2005 18:30:39 -0000
***************
*** 1,5 ****
  /* LineInputStream.java --
!    Copyright (C) 2002, 2003, 2004  Free Software Foundation, Inc.
  
  This file is part of GNU Classpath.
  
--- 1,5 ----
  /* LineInputStream.java --
!    Copyright (C) 2002, 2003, 2004, 2005  Free Software Foundation, Inc.
  
  This file is part of GNU Classpath.
  
*************** public class LineInputStream
*** 67,77 ****
    private boolean eof;
  
    /**
-    * Whether we can use block reads.
-    */
-   private final boolean blockReads;
- 
-   /**
     * Constructor using the US-ASCII character encoding.
     * @param in the underlying input stream
     */
--- 67,72 ----
*************** public class LineInputStream
*** 91,99 ****
      buf = new ByteArrayOutputStream();
      this.encoding = encoding;
      eof = false;
-     blockReads = in.markSupported();
    }
  
    /**
     * Read a line of input.
     */
--- 86,107 ----
      buf = new ByteArrayOutputStream();
      this.encoding = encoding;
      eof = false;
    }
  
+ 
+   private String bufToString() throws IOException
+   {
+     if ("US-ASCII".equals(encoding))
+       {
+         return buf.toString(0);
+       }
+     else
+       {
+         return buf.toString(encoding);
+       }
+   }
+     
+ 
    /**
     * Read a line of input.
     */
*************** public class LineInputStream
*** 106,198 ****
        }
      do
        {
!         if (blockReads)
!           {
!             // Use mark and reset to read chunks of bytes
!             final int MIN_LENGTH = 1024;
!             int len, pos;
!             
!             len = in.available();
!             len = (len < MIN_LENGTH) ? MIN_LENGTH : len;
!             byte[] b = new byte[len];
!             in.mark(len);
!             // Read into buffer b
!             len = in.read(b, 0, len);
!             // Handle EOF
!             if (len == -1)
!               {
!                 eof = true;
!                 if (buf.size() == 0)
!                   {
!                     return null;
!                   }
!                 else
!                   {
!                     // We don't care about resetting buf
!                     return buf.toString(encoding);
!                   }
!               }
!             // Get index of LF in b
!             pos = indexOf(b, len, (byte) 0x0a);
!             if (pos != -1)
!               {
!                 // Write pos bytes to buf
!                 buf.write(b, 0, pos);
!                 // Reset stream, and read pos + 1 bytes
!                 in.reset();
!                 pos += 1;
!                 while (pos > 0)
!                   {
!                     len = in.read(b, 0, pos);
!                     pos = (len == -1) ? -1 : pos - len;
!                   }
!                 // Return line
!                 String ret = buf.toString(encoding);
!                 buf.reset();
!                 return ret;
!               }
!             else
!               {
!                 // Append everything to buf and fall through to re-read.
!                 buf.write(b, 0, len);
!               }
!           }
!         else
            {
!             // We must use character reads in order not to read too much
!             // from the underlying stream.
!             int c = in.read();
!             switch (c)
                {
!               case -1:
!                 eof = true;
!                 if (buf.size() == 0)
!                   {
!                     return null;
!                   }
!                 // Fall through and return contents of buffer.
!               case 0x0a:                // LF
!                 String ret = buf.toString(encoding);
!                 buf.reset();
!                 return ret;
!               default:
!                 buf.write(c);
                }
            }
        }
      while (true);
    }
- 
-   private int indexOf(byte[] b, int len, byte c)
-   {
-     for (int pos = 0; pos < len; pos++)
-       {
-         if (b[pos] == c)
-           {
-             return pos;
-           }
-       }
-     return -1;
-   }
  }
  
--- 114,140 ----
        }
      do
        {
!         // We must use character reads in order not to read too much
!         // from the underlying stream.
!         int c = in.read();
!         switch (c)
            {
!           case -1:
!             eof = true;
!             if (buf.size() == 0)
                {
!                 return null;
                }
+             // Fall through and return contents of buffer.
+           case 0x0a:                // LF
+             String ret = bufToString();
+             buf.reset();
+             return ret;
+           default:
+             buf.write(c);
            }
        }
      while (true);
    }
  }
  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cp-patches] [Patch] Improve/fix gnu.java.net.LineInputStream...
  2005-09-12 19:34 [Patch] Improve/fix gnu.java.net.LineInputStream David Daney
@ 2005-09-13  8:15 ` Chris Burdess
  2005-09-13 15:54   ` David Daney
  2005-09-13 17:29   ` Tom Tromey
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Burdess @ 2005-09-13  8:15 UTC (permalink / raw)
  To: David Daney; +Cc: classpath-patches, Java Patch List

David Daney wrote:
> gnu.java.net.LineInputStream has at least one bug in it, but think its 
> whole approach is incorrect.
>
> First the bug:
>
>              len = in.available();
>              len = (len < MIN_LENGTH) ? MIN_LENGTH : len;
>
> I think the idea was to read all available bytes into its buffer (but 
> not more that the size of the buffer).  However the conditional was 
> reversed leading it to try to read more than is available.  This can 
> cause LineInputStream to block even if enough data is available to 
> form an entire line.
>
> I have not fully researched it but this was causing HTTP connections 
> to block for several seconds.  I think under the proper circumstances 
> they could block forever, but I am not positive about this.

If that's the case, there is a problem with the underlying stream. len 
is simply initialised to a reasonable value: either 1024 bytes, or more 
if the underlying input stream states that more can be read. The 
underlying stream is not required to read len bytes. It should read as 
many bytes as it can, and return the number of bytes read. The minimum 
value is to prevent LineInputStream trying to allocate a buffer of 0 or 
-1 bytes when the underlying stream doesn't know how many bytes are 
available. We can reduce this minimum value to 1 if it's causing 
problems.

> The main problem I have with LineInputStream, is that in all cases I 
> know about it is either reading from a raw socket or a 
> BufferedInputStream.
>
> In the raw socket case LineInputStream reads one character at a time 
> (as is required).  If the stream supports mark/reset then 
> LineInputStream reads big blocks of data and then resets if it finds 
> that it read too far.
>
> My problem with this is that if the stream supports mark/reset, then 
> it is already buffered and additional buffering is unlikely to produce 
> any additional benefit.  An as an added bad point you are using more 
> memory for the redundant buffer.

What additional buffering? The line buffer? It's hardly redundant, 
since you're not storing the same bytes in two places and we're freeing 
up the underlying stream's buffer to read more bytes. If the line is 
longer than the underlying stream's buffer, or the underlying stream is 
not buffered, it's necessary.

> I did take the liberty of adding my own micro-optimization, in that if 
> the encoding is US-ASCII, we can skip using String's character 
> encoding system and just request hibyte of 0.  I did this because a 
> year ago with  libgcj-3.4.3 we were seeing a vast increase in speed 
> doing this in a different situation.

This micro-optimisation should be applied to 
ByteArrayOutputStream.toString, not here.

> 	* classpath/gnu/java/net/LineInputStream.java (blockReads): Removed.
> 	(Constructor): Don't initialize blockReads.
> 	(bufToString): New method.
> 	(readLine): Removed block reading logic.
> 	(indexOf): Removed.
>
> I suppose if this is OK that I would commit it to classpath (when my 
> CVS access is enabled).
>
> Opinions?

Really bad idea. You're removing block reading capabilities, for what?
-- 
Chris Burdess

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cp-patches] [Patch] Improve/fix gnu.java.net.LineInputStream...
  2005-09-13  8:15 ` [cp-patches] " Chris Burdess
@ 2005-09-13 15:54   ` David Daney
  2005-09-13 17:29   ` Tom Tromey
  1 sibling, 0 replies; 5+ messages in thread
From: David Daney @ 2005-09-13 15:54 UTC (permalink / raw)
  To: Chris Burdess; +Cc: classpath-patches, Java Patch List

Chris Burdess wrote:
> David Daney wrote:
> 
>> gnu.java.net.LineInputStream has at least one bug in it, but think its 
>> whole approach is incorrect.
>>
>> First the bug:
>>
>>              len = in.available();
>>              len = (len < MIN_LENGTH) ? MIN_LENGTH : len;
>>
>> I think the idea was to read all available bytes into its buffer (but 
>> not more that the size of the buffer).  However the conditional was 
>> reversed leading it to try to read more than is available.  This can 
>> cause LineInputStream to block even if enough data is available to 
>> form an entire line.
>>
>> I have not fully researched it but this was causing HTTP connections 
>> to block for several seconds.  I think under the proper circumstances 
>> they could block forever, but I am not positive about this.
> 
> 
> If that's the case, there is a problem with the underlying stream. len 
> is simply initialised to a reasonable value: either 1024 bytes, or more 
> if the underlying input stream states that more can be read. The 
> underlying stream is not required to read len bytes. It should read as 
> many bytes as it can, and return the number of bytes read. The minimum 
> value is to prevent LineInputStream trying to allocate a buffer of 0 or 
> -1 bytes when the underlying stream doesn't know how many bytes are 
> available. We can reduce this minimum value to 1 if it's causing problems.

O.K.  I grant you that in theory readLine in its current state would not 
block IFF the underlying stream never blocks on reads when data is 
available.

> 
>> The main problem I have with LineInputStream, is that in all cases I 
>> know about it is either reading from a raw socket or a 
>> BufferedInputStream.
>>
>> In the raw socket case LineInputStream reads one character at a time 
>> (as is required).  If the stream supports mark/reset then 
>> LineInputStream reads big blocks of data and then resets if it finds 
>> that it read too far.
>>
>> My problem with this is that if the stream supports mark/reset, then 
>> it is already buffered and additional buffering is unlikely to produce 
>> any additional benefit.  An as an added bad point you are using more 
>> memory for the redundant buffer.
> 
> 
> What additional buffering? The line buffer? It's hardly redundant, since 
> you're not storing the same bytes in two places and we're freeing up the 
> underlying stream's buffer to read more bytes. If the line is longer 
> than the underlying stream's buffer, or the underlying stream is not 
> buffered, it's necessary.
> 

There are two cases.

1) Three buffers:

1a) The buffer in the underlying stream that allows it to be reset.

1b) The buffer (called 'b' in the code) that is read into.

1c) The line buffer (called 'buf' in the code).

2) One buffer : The line buffer.

One could make the argument that you could put a LineInputStream on top 
of a non-buffered resettable InputStream (perhaps a FileInputStream), 
but in practice it is never used in that manner so we don't consider 
this hypothetical case.

Now given that the buffer '1a' exists, buffer '1b' is redundant.

The current code allocates a temporary buffer '1b' with size max(1024, 
in.available()).  If in.available() is large '1b' will be large also.

I am not suggesting changing things just for my own amusement, I am 
experiencing real problems with the existing code.


>> I did take the liberty of adding my own micro-optimization, in that if 
>> the encoding is US-ASCII, we can skip using String's character 
>> encoding system and just request hibyte of 0.  I did this because a 
>> year ago with  libgcj-3.4.3 we were seeing a vast increase in speed 
>> doing this in a different situation.
> 
> 
> This micro-optimisation should be applied to 
> ByteArrayOutputStream.toString, not here.
> 

OK, I will do that given that I have to patch ByteArrayOutputStream as well.

>>
>> Opinions?
> 
> 
> Really bad idea.

Fine, but lets consider this statement:

 > You're removing block reading capabilities,

I am not removing block reading capabilities.  They remain in the 
underlying BufferedInputStream that LineInputStream is attached to.

All I am removing is redundant buffering and related memory allocations.

 > for what?

Reduced resource usage (memory and CPU) in the java runtime.

David Daney

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cp-patches] [Patch] Improve/fix gnu.java.net.LineInputStream...
  2005-09-13  8:15 ` [cp-patches] " Chris Burdess
  2005-09-13 15:54   ` David Daney
@ 2005-09-13 17:29   ` Tom Tromey
  2005-09-13 17:43     ` David Daney
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Tromey @ 2005-09-13 17:29 UTC (permalink / raw)
  To: Chris Burdess; +Cc: David Daney, classpath-patches, Java Patch List

>>>>> "Chris" == Chris Burdess <dog@bluezoo.org> writes:

>> I did take the liberty of adding my own micro-optimization, in that
>> if the encoding is US-ASCII, we can skip using String's character
>> encoding system and just request hibyte of 0.  I did this because a
>> year ago with  libgcj-3.4.3 we were seeing a vast increase in speed
>> doing this in a different situation.

Chris> This micro-optimisation should be applied to
Chris> ByteArrayOutputStream.toString, not here.

I agree with this approach.

However, I'm not sure this optimization is correct.  Strictly
speaking, we may see a byte >= 0x80, which is not ASCII.  Will the
ASCII converter turn this into '?'?  (I forget what happens here ... I
know Sun has been a bit lax about ASCII handling in some areas,
treating it as identical to 8859-1.)

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cp-patches] [Patch] Improve/fix gnu.java.net.LineInputStream...
  2005-09-13 17:29   ` Tom Tromey
@ 2005-09-13 17:43     ` David Daney
  0 siblings, 0 replies; 5+ messages in thread
From: David Daney @ 2005-09-13 17:43 UTC (permalink / raw)
  To: tromey; +Cc: Chris Burdess, classpath-patches, Java Patch List

Tom Tromey wrote:
>>>>>>"Chris" == Chris Burdess <dog@bluezoo.org> writes:
> 
> 
>>>I did take the liberty of adding my own micro-optimization, in that
>>>if the encoding is US-ASCII, we can skip using String's character
>>>encoding system and just request hibyte of 0.  I did this because a
>>>year ago with  libgcj-3.4.3 we were seeing a vast increase in speed
>>>doing this in a different situation.
> 
> 
> Chris> This micro-optimisation should be applied to
> Chris> ByteArrayOutputStream.toString, not here.
> 
> I agree with this approach.
> 
> However, I'm not sure this optimization is correct.  Strictly
> speaking, we may see a byte >= 0x80, which is not ASCII.  Will the
> ASCII converter turn this into '?'?  (I forget what happens here ... I
> know Sun has been a bit lax about ASCII handling in some areas,
> treating it as identical to 8859-1.)
> 

Well I am punting on this question.  We can do it later if deemed 
necessary.  I just committed the ByteArrayOutputStream.toString patch 
without handling this special case.

I think the question for LineInputStream has to be considered in the 
light the the HTTP RFC (2616).  What makes sense for LineInputStream may 
be different than for the more general purpose public API that is 
ByteArrayOutputStream.toString.

Any non ASCII characters in the response/headers are in violation of the 
RFC.  So it probably does not matter what we do, what ever is 
easiest/most efficient is probably best.

David Daney

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-09-13 17:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-12 19:34 [Patch] Improve/fix gnu.java.net.LineInputStream David Daney
2005-09-13  8:15 ` [cp-patches] " Chris Burdess
2005-09-13 15:54   ` David Daney
2005-09-13 17:29   ` Tom Tromey
2005-09-13 17:43     ` David Daney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).