public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
* /dev/clipboard pasting with small read() buffer
@ 2012-08-14 20:56 Thomas Wolff
  2012-08-16  9:34 ` Corinna Vinschen
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Wolff @ 2012-08-14 20:56 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: clipboard-small-buffer.patch --]
[-- Type: text/plain, Size: 2511 bytes --]

--- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
+++ ./fhandler_clipboard.cc	2012-08-14 18:25:14.903255600 +0200
@@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr,
   UINT formatlist[2];
   int format;
   LPVOID cb_data;
+  int rach;
 
   if (!OpenClipboard (NULL))
     {
@@ -243,12 +244,18 @@ fhandler_dev_clipboard::read (void *ptr,
       cygcb_t *clipbuf = (cygcb_t *) cb_data;
 
       if (pos < clipbuf->len)
-      	{
+	{
 	  ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len);
 	  memcpy (ptr, clipbuf->data + pos , ret);
 	  pos += ret;
 	}
     }
+  else if ((rach = get_readahead ()) >= 0)
+    {
+      /* Deliver from read-ahead buffer. */
+      * (char *) ptr = rach;
+      ret = 1;
+    }
   else
     {
       wchar_t *buf = (wchar_t *) cb_data;
@@ -256,25 +263,46 @@ fhandler_dev_clipboard::read (void *ptr,
       size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1;
       if (pos < glen)
 	{
+	  /* If caller's buffer is too small to hold at least one 
+	     max-size character, redirect algorithm to local 
+	     read-ahead buffer, finally fill class read-ahead buffer 
+	     with result and feed caller from there. */
+	  char * _ptr = (char *) ptr;
+	  size_t _len = len;
+	  char cprabuf [8 + 1];	/* need this length for surrogates */
+	  if (len < 8)
+	    {
+	      _ptr = cprabuf;
+	      _len = 8;
+	    }
+
 	  /* Comparing apples and oranges here, but the below loop could become
 	     extremly slow otherwise.  We rather return a few bytes less than
 	     possible instead of being even more slow than usual... */
-	  if (glen > pos + len)
-	    glen = pos + len;
+	  if (glen > pos + _len)
+	    glen = pos + _len;
 	  /* This loop is necessary because the number of bytes returned by
 	     sys_wcstombs does not indicate the number of wide chars used for
 	     it, so we could potentially drop wide chars. */
 	  while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos))
 		  != (size_t) -1
-		 && ret > len)
+		 && ret > _len)
 	     --glen;
 	  if (ret == (size_t) -1)
 	    ret = 0;
 	  else
 	    {
-	      ret = sys_wcstombs ((char *) ptr, (size_t) -1,
+	      ret = sys_wcstombs ((char *) _ptr, (size_t) -1,
 				  buf + pos, glen - pos);
 	      pos = glen;
+	      /* If using read-ahead buffer, copy to class read-ahead buffer
+	         and deliver first byte. */
+	      if (_ptr == cprabuf)
+		{
+		  puts_readahead (cprabuf, ret);
+		  * (char *) ptr = get_readahead ();
+		  ret = 1;
+		}
 	    }
 	}
     }

[-- Attachment #3: clipboard-small-buffer.changelog --]
[-- Type: text/plain, Size: 214 bytes --]

2012-08-14  Thomas Wolff  <towo@towo.net>

	* fhandler_clipboard.cc (fhandler_dev_clipboard::read): Use 
	read-ahead buffer for reading Windows clipboard if caller's 
	buffer is too small for complete characters.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-14 20:56 /dev/clipboard pasting with small read() buffer Thomas Wolff
@ 2012-08-16  9:34 ` Corinna Vinschen
  2012-08-16 12:12   ` Thomas Wolff
  0 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2012-08-16  9:34 UTC (permalink / raw)
  To: cygwin-patches

Hi Thomas,

thanks for the patch.   I have a few minor nits:

On Aug 14 22:56, Thomas Wolff wrote:
> --- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
> +++ ./fhandler_clipboard.cc	2012-08-14 18:25:14.903255600 +0200
> @@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr,
>    UINT formatlist[2];
>    int format;
>    LPVOID cb_data;
> +  int rach;
>  
>    if (!OpenClipboard (NULL))
>      {
> @@ -243,12 +244,18 @@ fhandler_dev_clipboard::read (void *ptr,
>        cygcb_t *clipbuf = (cygcb_t *) cb_data;
>  
>        if (pos < clipbuf->len)
> -      	{
> +	{
>  	  ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len);
>  	  memcpy (ptr, clipbuf->data + pos , ret);
>  	  pos += ret;
>  	}
>      }
> +  else if ((rach = get_readahead ()) >= 0)
> +    {
> +      /* Deliver from read-ahead buffer. */
> +      * (char *) ptr = rach;
> +      ret = 1;

See (*) below.

> +    }
>    else
>      {
>        wchar_t *buf = (wchar_t *) cb_data;
> @@ -256,25 +263,46 @@ fhandler_dev_clipboard::read (void *ptr,
>        size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1;
>        if (pos < glen)
>  	{
> +	  /* If caller's buffer is too small to hold at least one 
> +	     max-size character, redirect algorithm to local 
> +	     read-ahead buffer, finally fill class read-ahead buffer 
> +	     with result and feed caller from there. */
> +	  char * _ptr = (char *) ptr;
> +	  size_t _len = len;

I would prefer to have local variable names here which don't just
differ by a leading underscore.  It's a bit confusing.  What about,
say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that?

> +	  char cprabuf [8 + 1];	/* need this length for surrogates */
> +	  if (len < 8)
> +	    {
> +	      _ptr = cprabuf;
> +	      _len = 8;
> +	    }

8?  Why 8?  The size appears to be rather artificial.  The code should
use MB_CUR_MAX instead.

> +
>  	  /* Comparing apples and oranges here, but the below loop could become
>  	     extremly slow otherwise.  We rather return a few bytes less than
>  	     possible instead of being even more slow than usual... */
> -	  if (glen > pos + len)
> -	    glen = pos + len;
> +	  if (glen > pos + _len)
> +	    glen = pos + _len;
>  	  /* This loop is necessary because the number of bytes returned by
>  	     sys_wcstombs does not indicate the number of wide chars used for
>  	     it, so we could potentially drop wide chars. */
>  	  while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos))
>  		  != (size_t) -1
> -		 && ret > len)
> +		 && ret > _len)
>  	     --glen;
>  	  if (ret == (size_t) -1)
>  	    ret = 0;
>  	  else
>  	    {
> -	      ret = sys_wcstombs ((char *) ptr, (size_t) -1,
> +	      ret = sys_wcstombs ((char *) _ptr, (size_t) -1,
>  				  buf + pos, glen - pos);
>  	      pos = glen;
> +	      /* If using read-ahead buffer, copy to class read-ahead buffer
> +	         and deliver first byte. */
> +	      if (_ptr == cprabuf)
> +		{
> +		  puts_readahead (cprabuf, ret);
> +		  * (char *) ptr = get_readahead ();
> +		  ret = 1;

(*) Ok, that works, but wouldn't it be more efficient to do that in
a tiny loop along the lines of

		  int x;
		  ret = 0;
                  while (ret < len && (x = get_readahead ()) >= 0)
		    ptr++ = x;
		    ret++;

?


Corinna


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16  9:34 ` Corinna Vinschen
@ 2012-08-16 12:12   ` Thomas Wolff
  2012-08-16 12:31     ` Corinna Vinschen
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Wolff @ 2012-08-16 12:12 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 3110 bytes --]

Hi Corinna,

On 16.08.2012 11:33, Corinna Vinschen wrote:
> Hi Thomas,
>
> thanks for the patch.   I have a few minor nits:
>
> On Aug 14 22:56, Thomas Wolff wrote:
>> --- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
>> +++ ./fhandler_clipboard.cc	2012-08-14 18:25:14.903255600 +0200
>> ...
> See (*) below.
>
>> ...
>> +	  char * _ptr = (char *) ptr;
>> +	  size_t _len = len;
> I would prefer to have local variable names here which don't just
> differ by a leading underscore.  It's a bit confusing.  What about,
> say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that?
tmp_OK

>> +	  char cprabuf [8 + 1];	/* need this length for surrogates */
>> +	  if (len < 8)
>> +	    {
>> +	      _ptr = cprabuf;
>> +	      _len = 8;
>> +	    }
> 8?  Why 8?  The size appears to be rather artificial.  The code should
> use MB_CUR_MAX instead.
MB_CUR_MAX does not work because its value is 1 at this point
(after adding #include <stdlib.h> anyway).
I guess that's because it changes its value after setlocale().
To avoid interference with application invocation of setlocale(),
however, it must not be called here (with which parameters anyway...).

So the maximum length of all encodings needs to be used here which
would be 4 (UTF-8, EUC-TW, GB 18030) if there were not the surrogate
pairs of UTF-16.
Since a single surrogate can be encoded with 3 UTF-8 bytes, I tried
6 to host two of them but it didn't work. So I increased to 8
which seemed to work; I may have been mislead, however, since another
test case now shows that the patch does not always work with surrogates.
It only works if the non-BMP character (the surrogate pair) does not
extend over the border of two blocks of multiple size of these internal
small buffers..., meaning e.g. if the buffer size is 8 and a non-BMP
character starts at byte 7 of the clipboard, it will fail (only the
second surrogate will be pasted).
I don't know what to do about this right now; maybe some tweak
of sys_wcstombs would help, it should perhaps convert to non-surrogate
UTF-8 first before any other buffering considerations.

--- Actually, having written all this, I just notice that surrogate pairs
don't work with the "old" code either (using a larger caller buffer)
whenever the character is not block-aligned as described above.
So maybe for the current patch, 4 would indeed be good, while the
surrogate problem should be fixed in sys_wcstombs.
Update attached.

>> ...
>> +	      /* If using read-ahead buffer, copy to class read-ahead buffer
>> +	         and deliver first byte. */
>> +	      if (_ptr == cprabuf)
>> +		{
>> +		  puts_readahead (cprabuf, ret);
>> +		  * (char *) ptr = get_readahead ();
>> +		  ret = 1;
> (*) Ok, that works, but wouldn't it be more efficient to do that in
> a tiny loop along the lines of
>
> 		  int x;
> 		  ret = 0;
>                    while (ret < len && (x = get_readahead ()) >= 0)
> 		    ptr++ = x;
> 		    ret++;
>
> ?
I can add it if you prefer; I just didn't think it's worth the effort 
and concerning efficiency, after that prior trial-and-error 
count-down-loop...

------
Thomas

[-- Attachment #2: clipboard-small-buffer.patch.2 --]
[-- Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16 12:12   ` Thomas Wolff
@ 2012-08-16 12:31     ` Corinna Vinschen
  2012-08-16 14:21       ` Thomas Wolff
  0 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2012-08-16 12:31 UTC (permalink / raw)
  To: cygwin-patches

On Aug 16 14:11, Thomas Wolff wrote:
> Hi Corinna,
> 
> On 16.08.2012 11:33, Corinna Vinschen wrote:
> >Hi Thomas,
> >
> >thanks for the patch.   I have a few minor nits:
> >
> >On Aug 14 22:56, Thomas Wolff wrote:
> >>--- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
> >>+++ ./fhandler_clipboard.cc	2012-08-14 18:25:14.903255600 +0200
> >>...
> >See (*) below.
> >
> >>...
> >>+	  char * _ptr = (char *) ptr;
> >>+	  size_t _len = len;
> >I would prefer to have local variable names here which don't just
> >differ by a leading underscore.  It's a bit confusing.  What about,
> >say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that?
> tmp_OK
> 
> >>+	  char cprabuf [8 + 1];	/* need this length for surrogates */
> >>+	  if (len < 8)
> >>+	    {
> >>+	      _ptr = cprabuf;
> >>+	      _len = 8;
> >>+	    }
> >8?  Why 8?  The size appears to be rather artificial.  The code should
> >use MB_CUR_MAX instead.
> MB_CUR_MAX does not work because its value is 1 at this point

So what about MB_LEN_MAX then?  There's no problem using a multiplier,
but a symbolic constant is always better than a numerical constant.

> >>+	      /* If using read-ahead buffer, copy to class read-ahead buffer
> >>+	         and deliver first byte. */
> >>+	      if (_ptr == cprabuf)
> >>+		{
> >>+		  puts_readahead (cprabuf, ret);
> >>+		  * (char *) ptr = get_readahead ();
> >>+		  ret = 1;
> >(*) Ok, that works, but wouldn't it be more efficient to do that in
> >a tiny loop along the lines of
> >
> >		  int x;
> >		  ret = 0;
> >                   while (ret < len && (x = get_readahead ()) >= 0)
> >		    ptr++ = x;
> >		    ret++;
> >
> >?
> I can add it if you prefer; I just didn't think it's worth the
> effort and concerning efficiency, after that prior trial-and-error
> count-down-loop...

Yeah, that's a valid point.  But maybe we shouldn't make it slower
than necessary?  If you have a good idea how to avoid the other
loop, don't hesitate to submit a patch.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16 12:31     ` Corinna Vinschen
@ 2012-08-16 14:21       ` Thomas Wolff
  2012-08-16 15:24         ` Eric Blake
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Wolff @ 2012-08-16 14:21 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

On 16.08.2012 14:30, Corinna Vinschen wrote:
> On Aug 16 14:11, Thomas Wolff wrote:
>> Hi Corinna,
>>
>> On 16.08.2012 11:33, Corinna Vinschen wrote:
>>> Hi Thomas,
>>>
>>> thanks for the patch.   I have a few minor nits:
>>>
>>> On Aug 14 22:56, Thomas Wolff wrote:
>>> ... 
>>>> +	  char cprabuf [8 + 1];	/* need this length for surrogates */
>>>> +	  if (len < 8)
>>>> +	    {
>>>> +	      _ptr = cprabuf;
>>>> +	      _len = 8;
>>>> +	    }
>>> 8?  Why 8?  The size appears to be rather artificial.  The code should
>>> use MB_CUR_MAX instead.
>> MB_CUR_MAX does not work because its value is 1 at this point
> So what about MB_LEN_MAX then?  There's no problem using a multiplier,
> but a symbolic constant is always better than a numerical constant.
I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from 
limits.h (note the "_" distinction :) ),
because the latter, by its preceding comment, reserves the option to be 
changed into a dynamic function in the future, which could then possibly 
have the same problems as MB_CUR_MAX.

About the surrogates problem, I think I've found a solution:
I've added an explicit test to avoid processing of split surrogate pairs 
(to that loop...); this seems to work now.

>>>> +	      /* If using read-ahead buffer, copy to class read-ahead buffer
>>>> +	         and deliver first byte. */
>>>> +	      if (_ptr == cprabuf)
>>>> +		{
>>>> +		  puts_readahead (cprabuf, ret);
>>>> +		  * (char *) ptr = get_readahead ();
>>>> +		  ret = 1;
>>> (*) Ok, that works, but wouldn't it be more efficient to do that in
>>> a tiny loop along the lines of
>>>
>>> 		  int x;
>>> 		  ret = 0;
>>>                    while (ret < len && (x = get_readahead ()) >= 0)
>>> 		    ptr++ = x;
>>> 		    ret++;
>>>
>>> ?
>> I can add it if you prefer; I just didn't think it's worth the
>> effort and concerning efficiency, after that prior trial-and-error
>> count-down-loop...
> Yeah, that's a valid point.  But maybe we shouldn't make it slower
> than necessary?  If you have a good idea how to avoid the other
> loop, don't hesitate to submit a patch.
Added the loop to use up the caller's buffer.
About avoiding the trial-and-error loop, I think that would require 
digging into sys_mbstowcs (which doesn't even seem to behave as documented).

------
Thomas

[-- Attachment #2: clipboard-small-buffer.patch.3 --]
[-- Type: text/plain, Size: 2985 bytes --]

--- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
+++ ./fhandler_clipboard.cc	2012-08-16 16:08:23.782692300 +0200
@@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr,
   UINT formatlist[2];
   int format;
   LPVOID cb_data;
+  int rach;
 
   if (!OpenClipboard (NULL))
     {
@@ -243,12 +244,24 @@ fhandler_dev_clipboard::read (void *ptr,
       cygcb_t *clipbuf = (cygcb_t *) cb_data;
 
       if (pos < clipbuf->len)
-      	{
+	{
 	  ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len);
 	  memcpy (ptr, clipbuf->data + pos , ret);
 	  pos += ret;
 	}
     }
+  else if ((rach = get_readahead ()) >= 0)
+    {
+      /* Deliver from read-ahead buffer. */
+      char * out_ptr = (char *) ptr;
+      * out_ptr++ = rach;
+      ret = 1;
+      while (ret < len && (rach = get_readahead ()) >= 0)
+	{
+	  * out_ptr++ = rach;
+	  ret++;
+	}
+    }
   else
     {
       wchar_t *buf = (wchar_t *) cb_data;
@@ -256,25 +269,54 @@ fhandler_dev_clipboard::read (void *ptr,
       size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1;
       if (pos < glen)
 	{
+	  /* If caller's buffer is too small to hold at least one 
+	     max-size character, redirect algorithm to local 
+	     read-ahead buffer, finally fill class read-ahead buffer 
+	     with result and feed caller from there. */
+	  char * conv_ptr = (char *) ptr;
+	  size_t conv_len = len;
+#define cprabuf_len _MB_LEN_MAX	/* newlib's max MB_CUR_MAX of all encodings */
+	  char cprabuf [cprabuf_len];
+	  if (len < cprabuf_len)
+	    {
+	      conv_ptr = cprabuf;
+	      conv_len = cprabuf_len;
+	    }
+
 	  /* Comparing apples and oranges here, but the below loop could become
 	     extremly slow otherwise.  We rather return a few bytes less than
 	     possible instead of being even more slow than usual... */
-	  if (glen > pos + len)
-	    glen = pos + len;
+	  if (glen > pos + conv_len)
+	    glen = pos + conv_len;
 	  /* This loop is necessary because the number of bytes returned by
 	     sys_wcstombs does not indicate the number of wide chars used for
 	     it, so we could potentially drop wide chars. */
 	  while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos))
 		  != (size_t) -1
-		 && ret > len)
+		 && (ret > conv_len 
+			/* Skip separated high surrogate: */
+		     || ((buf [pos + glen - 1] & 0xFC00) == 0xD800 && glen - pos > 1)))
 	     --glen;
 	  if (ret == (size_t) -1)
 	    ret = 0;
 	  else
 	    {
-	      ret = sys_wcstombs ((char *) ptr, (size_t) -1,
+	      ret = sys_wcstombs ((char *) conv_ptr, (size_t) -1,
 				  buf + pos, glen - pos);
 	      pos = glen;
+	      /* If using read-ahead buffer, copy to class read-ahead buffer
+	         and deliver first byte. */
+	      if (conv_ptr == cprabuf)
+		{
+		  puts_readahead (cprabuf, ret);
+		  char * out_ptr = (char *) ptr;
+		  ret = 0;
+		  while (ret < len && (rach = get_readahead ()) >= 0)
+		    {
+		      * out_ptr++ = rach;
+		      ret++;
+		    }
+		}
 	    }
 	}
     }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16 14:21       ` Thomas Wolff
@ 2012-08-16 15:24         ` Eric Blake
  2012-08-16 16:23           ` Corinna Vinschen
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Blake @ 2012-08-16 15:24 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 818 bytes --]

On 08/16/2012 08:20 AM, Thomas Wolff wrote:

>>> MB_CUR_MAX does not work because its value is 1 at this point
>> So what about MB_LEN_MAX then?  There's no problem using a multiplier,
>> but a symbolic constant is always better than a numerical constant.
> I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from
> limits.h (note the "_" distinction :) ),
> because the latter, by its preceding comment, reserves the option to be
> changed into a dynamic function in the future, which could then possibly
> have the same problems as MB_CUR_MAX.

POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be
dynamic.  We cannot change MB_LEN_MAX to be dynamic in the future.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16 15:24         ` Eric Blake
@ 2012-08-16 16:23           ` Corinna Vinschen
  2012-08-17  8:44             ` Thomas Wolff
  0 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2012-08-16 16:23 UTC (permalink / raw)
  To: cygwin-patches

On Aug 16 09:24, Eric Blake wrote:
> On 08/16/2012 08:20 AM, Thomas Wolff wrote:
> 
> >>> MB_CUR_MAX does not work because its value is 1 at this point
> >> So what about MB_LEN_MAX then?  There's no problem using a multiplier,
> >> but a symbolic constant is always better than a numerical constant.
> > I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from
> > limits.h (note the "_" distinction :) ),
> > because the latter, by its preceding comment, reserves the option to be
> > changed into a dynamic function in the future, which could then possibly
> > have the same problems as MB_CUR_MAX.
> 
> POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be
> dynamic.  We cannot change MB_LEN_MAX to be dynamic in the future.

...also, Cygwin's include/limits.h doesn't mention to convert to
a function.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-16 16:23           ` Corinna Vinschen
@ 2012-08-17  8:44             ` Thomas Wolff
  2012-08-17  9:23               ` Corinna Vinschen
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Wolff @ 2012-08-17  8:44 UTC (permalink / raw)
  To: cygwin-patches

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On 16.08.2012 18:22, Corinna Vinschen wrote:
> On Aug 16 09:24, Eric Blake wrote:
>> On 08/16/2012 08:20 AM, Thomas Wolff wrote:
>>
>>>>> MB_CUR_MAX does not work because its value is 1 at this point
>>>> So what about MB_LEN_MAX then?  There's no problem using a multiplier,
>>>> but a symbolic constant is always better than a numerical constant.
>>> I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from
>>> limits.h (note the "_" distinction :) ),
>>> because the latter, by its preceding comment, reserves the option to be
>>> changed into a dynamic function in the future, which could then possibly
>>> have the same problems as MB_CUR_MAX.
>> POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be
>> dynamic.  We cannot change MB_LEN_MAX to be dynamic in the future.
> ...also, Cygwin's include/limits.h doesn't mention to convert to
> a function.
Not sure how to interpret exactly what it mentions. Anyway, my updated 
patch (using MB_LEN_MAX) proposes a change here as well.
------
Thomas

[-- Attachment #2: clipboard-small-buffer.patch.4 --]
[-- Type: text/plain, Size: 3548 bytes --]

diff -rup sav/fhandler_clipboard.cc ./fhandler_clipboard.cc
--- sav/fhandler_clipboard.cc	2012-07-08 02:36:47.000000000 +0200
+++ ./fhandler_clipboard.cc	2012-08-17 10:34:41.968750000 +0200
@@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr,
   UINT formatlist[2];
   int format;
   LPVOID cb_data;
+  int rach;
 
   if (!OpenClipboard (NULL))
     {
@@ -243,12 +244,24 @@ fhandler_dev_clipboard::read (void *ptr,
       cygcb_t *clipbuf = (cygcb_t *) cb_data;
 
       if (pos < clipbuf->len)
-      	{
+	{
 	  ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len);
 	  memcpy (ptr, clipbuf->data + pos , ret);
 	  pos += ret;
 	}
     }
+  else if ((rach = get_readahead ()) >= 0)
+    {
+      /* Deliver from read-ahead buffer. */
+      char * out_ptr = (char *) ptr;
+      * out_ptr++ = rach;
+      ret = 1;
+      while (ret < len && (rach = get_readahead ()) >= 0)
+	{
+	  * out_ptr++ = rach;
+	  ret++;
+	}
+    }
   else
     {
       wchar_t *buf = (wchar_t *) cb_data;
@@ -256,25 +269,54 @@ fhandler_dev_clipboard::read (void *ptr,
       size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1;
       if (pos < glen)
 	{
+	  /* If caller's buffer is too small to hold at least one 
+	     max-size character, redirect algorithm to local 
+	     read-ahead buffer, finally fill class read-ahead buffer 
+	     with result and feed caller from there. */
+	  char * conv_ptr = (char *) ptr;
+	  size_t conv_len = len;
+#define cprabuf_len MB_LEN_MAX	/* max MB_CUR_MAX of all encodings */
+	  char cprabuf [cprabuf_len];
+	  if (len < cprabuf_len)
+	    {
+	      conv_ptr = cprabuf;
+	      conv_len = cprabuf_len;
+	    }
+
 	  /* Comparing apples and oranges here, but the below loop could become
 	     extremly slow otherwise.  We rather return a few bytes less than
 	     possible instead of being even more slow than usual... */
-	  if (glen > pos + len)
-	    glen = pos + len;
+	  if (glen > pos + conv_len)
+	    glen = pos + conv_len;
 	  /* This loop is necessary because the number of bytes returned by
 	     sys_wcstombs does not indicate the number of wide chars used for
 	     it, so we could potentially drop wide chars. */
 	  while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos))
 		  != (size_t) -1
-		 && ret > len)
+		 && (ret > conv_len 
+			/* Skip separated high surrogate: */
+		     || ((buf [pos + glen - 1] & 0xFC00) == 0xD800 && glen - pos > 1)))
 	     --glen;
 	  if (ret == (size_t) -1)
 	    ret = 0;
 	  else
 	    {
-	      ret = sys_wcstombs ((char *) ptr, (size_t) -1,
+	      ret = sys_wcstombs ((char *) conv_ptr, (size_t) -1,
 				  buf + pos, glen - pos);
 	      pos = glen;
+	      /* If using read-ahead buffer, copy to class read-ahead buffer
+	         and deliver first byte. */
+	      if (conv_ptr == cprabuf)
+		{
+		  puts_readahead (cprabuf, ret);
+		  char * out_ptr = (char *) ptr;
+		  ret = 0;
+		  while (ret < len && (rach = get_readahead ()) >= 0)
+		    {
+		      * out_ptr++ = rach;
+		      ret++;
+		    }
+		}
 	    }
 	}
     }
diff -rup sav/include/limits.h ./include/limits.h
--- sav/include/limits.h	2011-07-21 22:21:49.000000000 +0200
+++ ./include/limits.h	2012-08-16 17:48:34.847141100 +0200
@@ -36,8 +36,7 @@ details. */
 
 /* Maximum length of a multibyte character.  */
 #ifndef MB_LEN_MAX
-/* TODO: This is newlib's max value.  We should probably rather define our
-   own _mbtowc_r and _wctomb_r functions which are only codepage dependent. */
+/* Use value from newlib although 4 is sufficient */
 #define MB_LEN_MAX 8
 #endif
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-17  8:44             ` Thomas Wolff
@ 2012-08-17  9:23               ` Corinna Vinschen
  2012-08-17 13:05                 ` Thomas Wolff
  0 siblings, 1 reply; 10+ messages in thread
From: Corinna Vinschen @ 2012-08-17  9:23 UTC (permalink / raw)
  To: cygwin-patches

On Aug 17 10:44, Thomas Wolff wrote:
> On 16.08.2012 18:22, Corinna Vinschen wrote:
> >On Aug 16 09:24, Eric Blake wrote:
> >>On 08/16/2012 08:20 AM, Thomas Wolff wrote:
> >>
> >>>>>MB_CUR_MAX does not work because its value is 1 at this point
> >>>>So what about MB_LEN_MAX then?  There's no problem using a multiplier,
> >>>>but a symbolic constant is always better than a numerical constant.
> >>>I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from
> >>>limits.h (note the "_" distinction :) ),
> >>>because the latter, by its preceding comment, reserves the option to be
> >>>changed into a dynamic function in the future, which could then possibly
> >>>have the same problems as MB_CUR_MAX.
> >>POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be
> >>dynamic.  We cannot change MB_LEN_MAX to be dynamic in the future.
> >...also, Cygwin's include/limits.h doesn't mention to convert to
> >a function.
> Not sure how to interpret exactly what it mentions.

This is from the time I was working on the extended locale support
in Cygwin 1.7.  I have not the faintest idea anymore what I was trying
to say with this comment.

>  Anyway, my
> updated patch (using MB_LEN_MAX) proposes a change here as well.

Thanks.  I dropped the hint that 4 is enough.  I'm not so sure about
that.  Linux, for instance, defines MB_LEN_MAX as 16.

Other than that, patch applied.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: /dev/clipboard pasting with small read() buffer
  2012-08-17  9:23               ` Corinna Vinschen
@ 2012-08-17 13:05                 ` Thomas Wolff
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Wolff @ 2012-08-17 13:05 UTC (permalink / raw)
  To: cygwin-patches

On 17.08.2012 11:22, Corinna Vinschen wrote:
> ...
>>   Anyway, my updated patch (using MB_LEN_MAX) proposes a change here as well.
> Thanks.  I dropped the hint that 4 is enough.  I'm not so sure about
> that.  Linux, for instance, defines MB_LEN_MAX as 16.
SunOS defines it as 5. 
http://www.kernel.org/doc/man-pages/online/pages/man3/MB_LEN_MAX.3.html 
says in glibc it is typically 6 (which would be needed for original 
UTF-8 covering 31-bit ISO-10646).

> Other than that, patch applied.
Thanks
Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-17 13:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-14 20:56 /dev/clipboard pasting with small read() buffer Thomas Wolff
2012-08-16  9:34 ` Corinna Vinschen
2012-08-16 12:12   ` Thomas Wolff
2012-08-16 12:31     ` Corinna Vinschen
2012-08-16 14:21       ` Thomas Wolff
2012-08-16 15:24         ` Eric Blake
2012-08-16 16:23           ` Corinna Vinschen
2012-08-17  8:44             ` Thomas Wolff
2012-08-17  9:23               ` Corinna Vinschen
2012-08-17 13:05                 ` Thomas Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).