public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR/42686] Align the help text output
@ 2010-02-25 10:32 Shujing Zhao
  2010-02-25 16:34 ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-02-25 10:32 UTC (permalink / raw)
  To: GCC Patches; +Cc: Joseph S. Myers, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

Hi,

This patch is to align the help text output when it includes the characters that 
use three bytes in UTF-8 in some languages, such as Chinese.
The option name is left-aligned with the width 27 or the length of itself from 
the 3th column. The help text is left-aligned with the width 50 (default) from 
the 31th column.
Tested on i686-pc-linux-gnu with no regression to the "C" locale environment.

OK for trunk?

Thanks
Pearly

[-- Attachment #2: ChangeLog --]
[-- Type: text/plain, Size: 261 bytes --]

2010-02-25  Shujing Zhao  <pearly.zhao@oracle.com>

	PR translation/42686
	* opts.c (wrap_help): Align the help output when it includes the
	characters that use three bytes in UTF-8.
	(print_filtered_help): Split the help string to option name and help
	text.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: pr42686.patch --]
[-- Type: text/x-patch; name="pr42686.patch", Size: 1668 bytes --]

Index: opts.c
===================================================================
--- opts.c	(修订版 157061)
+++ opts.c	(工作拷贝)
@@ -1158,8 +1158,9 @@ wrap_help (const char *help,
 	   unsigned int columns)
 {
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int remaining, room, len, width_diff;
 
+  width_diff = item_width - gcc_gettext_width(item);
   remaining = strlen (help);
 
   do
@@ -1183,10 +1184,20 @@ wrap_help (const char *help,
 		       && help[i + 1] != ' '
 		       && i > 0 && ISALPHA (help[i - 1]))
 		len = i + 1;
+	      /* For the characters use three bytes in UTF-8 in some languages,
+ 		 such as Chinese, move three bytes everytime. */
+	      else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0)
+		{
+		  i = i + 2;
+		  len = i + 1;
+		  room = room + 1;
+		}
 	    }
 	}
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
+      printf ("  %-*.*s %.*s\n", 
+	      col_width + width_diff, item_width, item, len, help);
+      width_diff = 0;
       item_width = 0;
       while (help[len] == ' ')
 	len++;
@@ -1242,6 +1253,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1290,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *)alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-02-25 10:32 [PATCH PR/42686] Align the help text output Shujing Zhao
@ 2010-02-25 16:34 ` Joseph S. Myers
  2010-03-10 10:26   ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-02-25 16:34 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Thu, 25 Feb 2010, Shujing Zhao wrote:

> Hi,
> 
> This patch is to align the help text output when it includes the characters
> that use three bytes in UTF-8 in some languages, such as Chinese.
> The option name is left-aligned with the width 27 or the length of itself from
> the 3th column. The help text is left-aligned with the width 50 (default) from
> the 31th column.

I can't understand this description of what you intend to achieve here.

> +  width_diff = item_width - gcc_gettext_width(item);

Missing space before "(".

> @@ -1183,10 +1184,20 @@ wrap_help (const char *help,
>  		       && help[i + 1] != ' '
>  		       && i > 0 && ISALPHA (help[i - 1]))
>  		len = i + 1;
> +	      /* For the characters use three bytes in UTF-8 in some languages,
> + 		 such as Chinese, move three bytes everytime. */
> +	      else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0)
> +		{
> +		  i = i + 2;
> +		  len = i + 1;
> +		  room = room + 1;

This does not make sense to me.  Checking whether a "char" is negative is 
not meaningful in portable code since char may be either signed or 
unsigned, the locale may not use UTF-8, and some characters may have other 
lengths.

You have a translated help text, in whatever the locale character set is.  
You need to find a suitable point to break a line.  In so doing, you need 
to step by multibyte characters (using interfaces such as mbrtowc to do so 
- put wrappers in intl.c that fall back to single characters if 
--disable-nls or the relevant interfaces are not available).  You can then 
examine the wide characters - not the multibyte sequences - to see what is 
a suitable point to wrap the line, and determine the actual display width 
of each wide character in the process.  I don't know if comparing with 
L'-', L'/' and L' ' is really appropriate for all languages, but don't 
have a better option.  However, the locale-independent ISALPHA, used in 
the present code, clearly seems inappropriate; using iswalpha (again with 
a fallback for --disable-nls or when this function is not present) would 
be better.

I'd be inclined to put the entire logic for testing whether something is a 
suitable point for wrapping, and for determining what characters to skip 
after wrapping, in a function in intl.c.  (The code that supports wrapping 
lines for diagnostics suffers the same problems with counting bytes rather 
than display width, and might be able to share some of the fix.)

> -      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
> +      printf ("  %-*.*s %.*s\n", 
> +	      col_width + width_diff, item_width, item, len, help);
> +      width_diff = 0;

Please note that both width and precision in printf formats count *bytes* 
not *multibyte characters* or *screen columns*.  So once you've corrected 
the logic above and know both how wide the text you want to print is in 
screen columns, and how many bytes it is, you need to think carefully 
about what the correct width and precision parameters are.

> +	  buf = (char *)alloca (len + 1);

Missing space after ")" in cast.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-02-25 16:34 ` Joseph S. Myers
@ 2010-03-10 10:26   ` Shujing Zhao
  2010-03-10 17:19     ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-10 10:26 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

Hi Joseph, all
On 02/26/2010 12:09 AM, Joseph S. Myers wrote:
>> @@ -1183,10 +1184,20 @@ wrap_help (const char *help,
>>  		       && help[i + 1] != ' '
>>  		       && i > 0 && ISALPHA (help[i - 1]))
>>  		len = i + 1;
>> +	      /* For the characters use three bytes in UTF-8 in some languages,
>> + 		 such as Chinese, move three bytes everytime. */
>> +	      else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0)
>> +		{
>> +		  i = i + 2;
>> +		  len = i + 1;
>> +		  room = room + 1;
> 
> This does not make sense to me.  Checking whether a "char" is negative is 
> not meaningful in portable code since char may be either signed or 
> unsigned, the locale may not use UTF-8, and some characters may have other 
> lengths.
> 
> You have a translated help text, in whatever the locale character set is.  
> You need to find a suitable point to break a line.  In so doing, you need 
> to step by multibyte characters (using interfaces such as mbrtowc to do so 
> - put wrappers in intl.c that fall back to single characters if 
> --disable-nls or the relevant interfaces are not available).  You can then 
> examine the wide characters - not the multibyte sequences - to see what is 
> a suitable point to wrap the line, and determine the actual display width 
> of each wide character in the process.  I don't know if comparing with 
> L'-', L'/' and L' ' is really appropriate for all languages, but don't 
> have a better option.  However, the locale-independent ISALPHA, used in 
> the present code, clearly seems inappropriate; using iswalpha (again with 
> a fallback for --disable-nls or when this function is not present) would 
> be better.
> 
> I'd be inclined to put the entire logic for testing whether something is a 
> suitable point for wrapping, and for determining what characters to skip 
> after wrapping, in a function in intl.c.  (The code that supports wrapping 
> lines for diagnostics suffers the same problems with counting bytes rather 
> than display width, and might be able to share some of the fix.)
> 
>> -      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
>> +      printf ("  %-*.*s %.*s\n", 
>> +	      col_width + width_diff, item_width, item, len, help);
>> +      width_diff = 0;
> 
> Please note that both width and precision in printf formats count *bytes* 
> not *multibyte characters* or *screen columns*.  So once you've corrected 
> the logic above and know both how wide the text you want to print is in 
> screen columns, and how many bytes it is, you need to think carefully 
> about what the correct width and precision parameters are.
> 
>> +	  buf = (char *)alloca (len + 1);
> 
> Missing space after ")" in cast.
> 
Thanks your instruction for wide character handling.

The updated patch handle the help string in a separate function get_aligned_len 
when the number of bytes is not equal to the number of column positions for a 
wide-character string.
The functions when --disable-nls or the relevant interfaces are not available 
are also defined. The questions is the define of get_col_width and 
get_aligned_len would never be used, because gcc_gettext_width (s) is always 
equal to strlen (s) when --disable-nls. Then, do you think the definitions of 
get_col_width and get_aligned_len for --disable-nls are necessary?
Tested on i686-pc-linux-gnu.
Is it ok?

Thanks
Pearly

[-- Attachment #2: ChangeLog --]
[-- Type: text/plain, Size: 423 bytes --]

2010-03-10  Shujing Zhao  <pearly.zhao@oracle.com>

	PR translation/42686
	* intl.h (get_col_width, get_aligned_len): New prototype.
	* intl.c (wctomb_nbytes, get_col_width): New function.
	(get_aligned_len): New function to handle the wide character help
	string.
	* opts.c (wrap_help): Align the help output when it includes wide
	characters.
	(print_filtered_help): Split the help string to option name and help
	text.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 42686.patch --]
[-- Type: text/x-patch; name="42686.patch", Size: 6265 bytes --]

Index: intl.h
===================================================================
--- intl.h	(修订版 157331)
+++ intl.h	(工作拷贝)
@@ -30,6 +30,9 @@
 #include <libintl.h>
 extern void gcc_init_libintl (void);
 extern size_t gcc_gettext_width (const char *);
+extern unsigned int get_col_width (const char *, unsigned int);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 #else
 /* Stubs.  */
 # undef textdomain
@@ -41,6 +44,8 @@ extern size_t gcc_gettext_width (const c
 # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n)
 # define gcc_init_libintl()	/* nothing */
 # define gcc_gettext_width(s) strlen(s)
+# define get_col_width(s, room) room
+# define get_aligned_len(s, room, remaining) remaining
 
 extern const char *fake_ngettext(const char *singular,const char *plural,
                                  unsigned long int n);
Index: intl.c
===================================================================
--- intl.c	(修订版 157331)
+++ intl.c	(工作拷贝)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,94 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Return the number of bytes when converts the wide character WC
+   to its multibyte representation. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *)xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Expand the ROOM when the number of bytes of MSGSTR is larger than
+   the width in columns. */
+
+unsigned int
+get_col_width (const char *msgstr, unsigned int room)
+{
+  int i;
+  int screen_cols, nbytes;
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  for ( i = 0; wmsgstr [i]; i++)
+    {
+      /* The number of column position for a character. */
+      screen_cols = wcwidth (wmsgstr [i]);
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      if (screen_cols != nbytes)
+        room = room + (nbytes - screen_cols);
+    }
+  return room;
+}
+
+/* Return the number of bytes of MSGSTR that can be printed within ROOM.
+   REMAINING is the overall length of MSTSTR. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		     unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = 0;
+  unsigned int pos = 0;
+  unsigned int wc_num = 0;
+  int nbytes;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  
+  for (i = 0; i < nwcs; i++)
+    {
+      unsigned int maybe_len = len + (i - pos);
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      if (maybe_len == remaining)
+        len = maybe_len;
+      else if (maybe_len >= room)
+        break;
+      else if (nbytes > 1)
+        {
+          wc_num++;
+          len = wc_num * nbytes + (i + 1 - wc_num);
+          room = room + (nbytes - wcwidth (wmsgstr [i]));
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              nbytes = wctomb_nbytes (wmsgstr [i]);
+	      if (nbytes > 1)
+		{
+                  wc_num++;
+                  len = wc_num * nbytes + (i + 1 - wc_num);
+	          room = room + (nbytes - wcwidth (wmsgstr [i]));
+		}
+	      else
+		len = len + 1;
+            }
+          pos = i;
+        }
+    }
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -117,6 +206,19 @@ gcc_gettext_width (const char *msgstr)
   return strlen (msgstr);
 }
 
+unsigned int
+get_col_width (const char *msgstr, unsigned int room)
+{
+  return room;
+}
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  return remaining;
+}
+
 #endif
 
 #endif /* ENABLE_NLS */
Index: opts.c
===================================================================
--- opts.c	(修订版 157331)
+++ opts.c	(工作拷贝)
@@ -1161,6 +1161,8 @@ wrap_help (const char *help,
   unsigned int remaining, room, len;
 
   remaining = strlen (help);
+  if (item_width != gcc_gettext_width (item))
+    col_width = get_col_width (item, col_width);
 
   do
     {
@@ -1173,20 +1175,27 @@ wrap_help (const char *help,
 	{
 	  unsigned int i;
 
-	  for (i = 0; help[i]; i++)
+	  if (len != gcc_gettext_width (help))
 	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
+	      len = get_aligned_len (help, room, remaining);
+	    }
+	  else
+	    {
+              for (i = 0; help[i]; i++)
+	        {
+	          if (i >= room && len != remaining)
+		    break;
+	          if (help[i] == ' ')
+		    len = i;
+	          else if ((help[i] == '-' || help[i] == '/')
+		           && help[i + 1] != ' '
+		           && i > 0 && ISALPHA (help[i - 1]))
+		    len = i + 1;
+	        }
 	    }
 	}
-
       printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
+      col_width = LEFT_COLUMN;
       item_width = 0;
       while (help[len] == ' ')
 	len++;
@@ -1242,6 +1251,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1288,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *)alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-10 10:26   ` Shujing Zhao
@ 2010-03-10 17:19     ` Joseph S. Myers
  2010-03-12 11:34       ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-10 17:19 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Wed, 10 Mar 2010, Shujing Zhao wrote:

> +/* Expand the ROOM when the number of bytes of MSGSTR is larger than
> +   the width in columns. */
> +
> +unsigned int
> +get_col_width (const char *msgstr, unsigned int room)

The comment on this function makes no sense to me at all.  What does 
"Expand the ROOM" mean?  It sounds as if ROOM is a parameter passed by 
reference that might be changed by the function - but that clearly isn't 
the case for a parameter of type unsigned int.  What are the semantics of 
the operand ROOM?  What are the semantics of the return value?  (But see 
below on how "the number of bytes of MSGSTR is larger than the width in 
columns" is itself not a sensible concept for the code to be considering 
in the first place.)

> +      if (screen_cols != nbytes)

This comparison is conceptually confused.  In logical terms, "columns" and 
"bytes" are different data types that it does not make sense to compare.  
It so happens that in the present C code they have the same static type so 
the host compiler doesn't give an error for this comparison - but it is 
still confused, and you need to arrange the code so it doesn't try to mix 
different types like this.  This means having a very clear notion of 
whether a particular variable counts columns, bytes or characters, and of 
how you do explicit computations of the number of each kind of object in a 
particular part of a string, and no comparisons between counts of 
different things.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-10 17:19     ` Joseph S. Myers
@ 2010-03-12 11:34       ` Shujing Zhao
  2010-03-12 17:09         ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-12 11:34 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]

On 03/11/2010 01:05 AM, Joseph S. Myers wrote:
> On Wed, 10 Mar 2010, Shujing Zhao wrote:
> 
>> +/* Expand the ROOM when the number of bytes of MSGSTR is larger than
>> +   the width in columns. */
>> +
>> +unsigned int
>> +get_col_width (const char *msgstr, unsigned int room)
> 
> The comment on this function makes no sense to me at all.  What does 
> "Expand the ROOM" mean?  It sounds as if ROOM is a parameter passed by 
> reference that might be changed by the function - but that clearly isn't 
> the case for a parameter of type unsigned int.  What are the semantics of 
> the operand ROOM?  What are the semantics of the return value?  (But see 
> below on how "the number of bytes of MSGSTR is larger than the width in 
> columns" is itself not a sensible concept for the code to be considering 
> in the first place.)
> 
>> +      if (screen_cols != nbytes)
> 
> This comparison is conceptually confused.  In logical terms, "columns" and 
> "bytes" are different data types that it does not make sense to compare.  
> It so happens that in the present C code they have the same static type so 
> the host compiler doesn't give an error for this comparison - but it is 
> still confused, and you need to arrange the code so it doesn't try to mix 
> different types like this.  This means having a very clear notion of 
> whether a particular variable counts columns, bytes or characters, and of 
> how you do explicit computations of the number of each kind of object in a 
> particular part of a string, and no comparisons between counts of 
> different things.

Thanks.
For the number of bytes of a wide character is different with the display column 
width, I have always tried to find the relationship between the display column 
width with justification width that printf counted by bytes.

This new patch fixed the function get_col_width totally. It is to return the 
really field width when left-justify print the wide character strings. The 
design splits the width to two parts. One is to print the strings, and the other 
is to print the spaces when the wide character display width less than 
LEFT_COLUMN. Composed the two parts would get the really width counted by bytes 
that can be used by printf.

The print of help strings looks very fine. Is it ok?

Thanks
Pearly

[-- Attachment #2: 42686.patch --]
[-- Type: text/x-patch, Size: 6137 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157401)
+++ intl.h	(working copy)
@@ -30,6 +30,9 @@
 #include <libintl.h>
 extern void gcc_init_libintl (void);
 extern size_t gcc_gettext_width (const char *);
+extern unsigned int get_col_width (const char *, unsigned int);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 #else
 /* Stubs.  */
 # undef textdomain
@@ -41,6 +44,8 @@ extern size_t gcc_gettext_width (const c
 # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n)
 # define gcc_init_libintl()	/* nothing */
 # define gcc_gettext_width(s) strlen(s)
+# define get_col_width(s, room) room
+# define get_aligned_len(s, room, remaining) remaining
 
 extern const char *fake_ngettext(const char *singular,const char *plural,
                                  unsigned long int n);
Index: intl.c
===================================================================
--- intl.c	(revision 157401)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,84 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Return the number of bytes when converts the wide character WC
+   to its multibyte representation. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *)xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Return the really width needed when left-justify print 
+   the MSGSTR which came from gettext with the column width ROOM. */
+
+unsigned int
+get_col_width (const char *msgstr, unsigned int room)
+{
+  unsigned int width = gcc_gettext_width (msgstr);
+  if (width < room)
+    /* The (room -width) is the number spaces to print for aligned. */
+    width = strlen (msgstr) + (room - width);
+  return width;
+}
+
+/* Return the number of bytes of MSGSTR that can be printed within ROOM.
+   REMAINING is the overall length of MSTSTR. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		     unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = 0;
+  unsigned int pos = 0;
+  unsigned int wc_num = 0;
+  int nbytes;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  
+  for (i = 0; i < nwcs; i++)
+    {
+      unsigned int maybe_len = len + (i - pos);
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      if (maybe_len == remaining)
+        len = maybe_len;
+      else if (maybe_len >= room)
+        break;
+      else if (nbytes > 1)
+        {
+          wc_num++;
+          len = wc_num * nbytes + (i + 1 - wc_num);
+          room = room + (nbytes - wcwidth (wmsgstr [i]));
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              nbytes = wctomb_nbytes (wmsgstr [i]);
+	      if (nbytes > 1)
+		{
+                  wc_num++;
+                  len = wc_num * nbytes + (i + 1 - wc_num);
+	          room = room + (nbytes - wcwidth (wmsgstr [i]));
+		}
+	      else
+		len = len + 1;
+            }
+          pos = i;
+        }
+    }
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -117,6 +196,19 @@ gcc_gettext_width (const char *msgstr)
   return strlen (msgstr);
 }
 
+unsigned int
+get_col_width (const char *msgstr, unsigned int room)
+{
+  return room;
+}
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  return remaining;
+}
+
 #endif
 
 #endif /* ENABLE_NLS */
Index: opts.c
===================================================================
--- opts.c	(revision 157401)
+++ opts.c	(working copy)
@@ -1157,11 +1157,16 @@ wrap_help (const char *help,
 	   unsigned int item_width,
 	   unsigned int columns)
 {
-  unsigned int col_width = LEFT_COLUMN;
+  unsigned int col_width;
   unsigned int remaining, room, len;
 
   remaining = strlen (help);
 
+  if (item_width == gcc_gettext_width (item))
+    col_width = LEFT_COLUMN;
+  else
+    col_width = get_col_width (item, LEFT_COLUMN);
+
   do
     {
       room = columns - 3 - MAX (col_width, item_width);
@@ -1173,20 +1178,27 @@ wrap_help (const char *help,
 	{
 	  unsigned int i;
 
-	  for (i = 0; help[i]; i++)
+	  if (len != gcc_gettext_width (help))
+	    {
+	      len = get_aligned_len (help, room, remaining);
+	    }
+	  else
 	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
+              for (i = 0; help[i]; i++)
+	        {
+	          if (i >= room && len != remaining)
+		    break;
+	          if (help[i] == ' ')
+		    len = i;
+	          else if ((help[i] == '-' || help[i] == '/')
+		           && help[i + 1] != ' '
+		           && i > 0 && ISALPHA (help[i - 1]))
+		    len = i + 1;
+	        }
 	    }
 	}
-
       printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
+      col_width = LEFT_COLUMN;
       item_width = 0;
       while (help[len] == ' ')
 	len++;
@@ -1242,6 +1254,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1291,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *)alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-12 11:34       ` Shujing Zhao
@ 2010-03-12 17:09         ` Joseph S. Myers
  2010-03-15  9:28           ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-12 17:09 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Fri, 12 Mar 2010, Shujing Zhao wrote:

> +/* Return the really width needed when left-justify print 
> +   the MSGSTR which came from gettext with the column width ROOM. */

I still can't make sense of this comment, and can't review the patch 
without clear English comments explaining the semantics of the various 
functions.  What is "the really width"?  Needed for that?  What is "when 
left-justify print"?  What is "column width ROOM"?

If this is something to do with the peculiarities of printf functions 
(counting bytes, when you are concerned with columns for alignment), you 
might wish to consider simply not using printf width functionality with 
strings that may contain non-1-column characters.  Output a certain number 
of bytes of the help text in one call, and a certain number of spaces in 
another call, and include clear comments explaining the printf 
peculiarities that are the reason for doing so.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-12 17:09         ` Joseph S. Myers
@ 2010-03-15  9:28           ` Shujing Zhao
  2010-03-15 10:58             ` Shujing Zhao
  2010-03-15 11:42             ` Joseph S. Myers
  0 siblings, 2 replies; 20+ messages in thread
From: Shujing Zhao @ 2010-03-15  9:28 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

On 03/13/2010 12:59 AM, Joseph S. Myers wrote:
> On Fri, 12 Mar 2010, Shujing Zhao wrote:
> 
>> +/* Return the really width needed when left-justify print 
>> +   the MSGSTR which came from gettext with the column width ROOM. */
> 
> I still can't make sense of this comment, and can't review the patch 
> without clear English comments explaining the semantics of the various 
> functions.  What is "the really width"?  Needed for that?  What is "when 
> left-justify print"?  What is "column width ROOM"?
> 
yes, it looks quite confused now...
> If this is something to do with the peculiarities of printf functions 
> (counting bytes, when you are concerned with columns for alignment), you 
> might wish to consider simply not using printf width functionality with 
> strings that may contain non-1-column characters.  Output a certain number 
> of bytes of the help text in one call, and a certain number of spaces in 
> another call, and include clear comments explaining the printf 
> peculiarities that are the reason for doing so.
> 
Yes. Breaking the printf makes the code clearer and easier. Thanks to point out 
it. At the new patch, I broke the printf into output item and output help 
string. The code to output the item that includes wide character is moved back 
to opts.c, for it doesn't use any wcs functions. The others are kept with the 
old patch.
Thanks for your patience reviewing the patches.

Is it OK?

[-- Attachment #2: 03151650.patch --]
[-- Type: text/x-patch, Size: 6061 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157453)
+++ intl.h	(working copy)
@@ -30,6 +30,8 @@
 #include <libintl.h>
 extern void gcc_init_libintl (void);
 extern size_t gcc_gettext_width (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 #else
 /* Stubs.  */
 # undef textdomain
@@ -41,6 +43,7 @@ extern size_t gcc_gettext_width (const c
 # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n)
 # define gcc_init_libintl()	/* nothing */
 # define gcc_gettext_width(s) strlen(s)
+# define get_aligned_len(s, room, remaining) remaining
 
 extern const char *fake_ngettext(const char *singular,const char *plural,
                                  unsigned long int n);
Index: intl.c
===================================================================
--- intl.c	(revision 157453)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,70 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Return the number of bytes when converts the wide character WC
+   to its multibyte representation. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *)xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. */ 
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+	         unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = 0;
+  unsigned int pos = 0;
+  unsigned int wc_num = 0;
+  int nbytes;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  
+  for (i = 0; i < nwcs; i++)
+    {
+      unsigned int maybe_len = len + (i - pos);
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      if (maybe_len == remaining)
+        len = maybe_len;
+      else if (maybe_len >= room)
+        break;
+      else if (nbytes > 1)
+        {
+          wc_num++;
+          len = wc_num * nbytes + (i + 1 - wc_num);
+          room = room + (nbytes - wcwidth (wmsgstr [i]));
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              nbytes = wctomb_nbytes (wmsgstr [i]);
+	      if (nbytes > 1)
+		{
+                  wc_num++;
+                  len = wc_num * nbytes + (i + 1 - wc_num);
+	          room = room + (nbytes - wcwidth (wmsgstr [i]));
+		}
+	      else
+		len = len + 1;
+            }
+          pos = i;
+        }
+    }
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -117,6 +182,13 @@ gcc_gettext_width (const char *msgstr)
   return strlen (msgstr);
 }
 
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  return remaining;
+}
+
 #endif
 
 #endif /* ENABLE_NLS */
Index: opts.c
===================================================================
--- opts.c	(revision 157453)
+++ opts.c	(working copy)
@@ -1158,12 +1158,31 @@ wrap_help (const char *help,
 	   unsigned int columns)
 {
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int remaining, room, len, item_col_width;
+  unsigned int i;
 
   remaining = strlen (help);
+  item_col_width = gcc_gettext_width (item); 
 
   do
     {
+      if (item_width == item_col_width
+          || item_width == 0)
+        printf ("  %-*.*s", col_width, item_width, item);
+      /* Handle the item that includes wide character. */
+      else
+        {
+          printf ("  %s", item);
+          /* Output a certain number of spaces for alignment. */
+          if (item_col_width < col_width) 
+            { 
+              unsigned int n_spaces = col_width - item_col_width; 
+              const char *space = " ";
+              for (i = 0; i < n_spaces; i++) 
+                printf ("%s", space); 
+            }
+        }
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
@@ -1171,22 +1190,25 @@ wrap_help (const char *help,
 
       if (room < len)
 	{
-	  unsigned int i;
-
-	  for (i = 0; help[i]; i++)
+	  if (len != gcc_gettext_width (help))
+            len = get_aligned_len (help, room, remaining);
+	  else
 	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
+              for (i = 0; help[i]; i++)
+	        {
+	          if (i >= room && len != remaining)
+		    break;
+	          if (help[i] == ' ')
+		    len = i;
+	          else if ((help[i] == '-' || help[i] == '/')
+		           && help[i + 1] != ' '
+		           && i > 0 && ISALPHA (help[i - 1]))
+		    len = i + 1;
+	        }
 	    }
 	}
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
+      printf ( " %.*s\n", len, help); 
       item_width = 0;
       while (help[len] == ' ')
 	len++;
@@ -1242,6 +1264,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1301,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *)alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-15  9:28           ` Shujing Zhao
@ 2010-03-15 10:58             ` Shujing Zhao
  2010-03-15 11:42             ` Joseph S. Myers
  1 sibling, 0 replies; 20+ messages in thread
From: Shujing Zhao @ 2010-03-15 10:58 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On 03/15/2010 05:13 PM, Shujing Zhao wrote:
> Yes. Breaking the printf makes the code clearer and easier. Thanks to 
> point out it. At the new patch, I broke the printf into output item and 
> output help string. The code to output the item that includes wide 
> character is moved back to opts.c, for it doesn't use any wcs functions. 
> The others are kept with the old patch.
> Thanks for your patience reviewing the patches.

I reconsider the design of get_aligned_len after reading some material of line 
break at gnulib. And get_aligned_len should be changed to keep with the change 
of output item at the last patch.

Is it ok?

Thanks
Pearly

[-- Attachment #2: 03151742.patch --]
[-- Type: text/x-patch, Size: 6155 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157453)
+++ intl.h	(working copy)
@@ -30,6 +30,8 @@
 #include <libintl.h>
 extern void gcc_init_libintl (void);
 extern size_t gcc_gettext_width (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 #else
 /* Stubs.  */
 # undef textdomain
@@ -41,6 +43,7 @@
 # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n)
 # define gcc_init_libintl()	/* nothing */
 # define gcc_gettext_width(s) strlen(s)
+# define get_aligned_len (s, room, remaining) remaining
 
 extern const char *fake_ngettext(const char *singular,const char *plural,
                                  unsigned long int n);
Index: intl.c
===================================================================
--- intl.c	(revision 157453)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,87 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Return the number of bytes when converts the wide character WC
+   to its multibyte representation. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *)xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. */ 
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+	         unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = 0;
+  unsigned int pos = 0;
+  int nbytes;
+  int len_bk;
+  wchar_t *buf;
+  unsigned int colwide = 0;
+  size_t remaining_width;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  remaining_width = wcswidth (wmsgstr, nwcs);
+  
+  for (i = 0; i < nwcs; i++)
+    {
+      unsigned int maybe_width = colwide + (i - pos);
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      if (maybe_width == remaining_width)
+        len = remaining;
+      else if (colwide > room)
+        {
+          len = len_bk;
+          break;
+        }
+      else if (colwide == room || maybe_width > room)
+        break;
+      else if (nbytes > 1)
+        {
+          buf = XALLOCAVEC (wchar_t, i + 2);
+          wcsncpy (buf, wmsgstr, i + 1);
+          buf [i + 1] = L'\0';
+          len_bk = len;
+          len = wcstombs (0, buf, 0);
+          colwide = wcswidth (buf, i + 1);
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              nbytes = wctomb_nbytes (wmsgstr [i]);
+	      if (nbytes > 1)
+		{
+                  buf = XALLOCAVEC (wchar_t, i + 2);
+                  wcsncpy (buf, wmsgstr, i + 1);
+                  buf [i + 1] = '\0';
+                  len = wcstombs (0, buf, 0);
+	          colwide = wcswidth (buf, i + 1);
+		}
+	      else
+                {
+		  len = len + 1;
+                  colwide = colwide + 1;
+                }
+            }
+          pos = i;
+        }
+    }
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -117,6 +199,13 @@
   return strlen (msgstr);
 }
 
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  return remaining;
+}
+
 #endif
 
 #endif /* ENABLE_NLS */
Index: opts.c
===================================================================
--- opts.c	(revision 157453)
+++ opts.c	(working copy)
@@ -1158,12 +1158,29 @@
 	   unsigned int columns)
 {
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int remaining, room, len, item_col_width;
+  unsigned int i;
 
   remaining = strlen (help);
+  item_col_width = gcc_gettext_width (item); 
 
   do
     {
+      if (item_width == item_col_width
+          || item_width == 0)
+        printf ("  %-*.*s", col_width, item_width, item);
+      else
+        { 
+          printf ("  %s", item);
+          if (item_col_width < col_width) 
+            { 
+              unsigned int n_spaces = col_width - item_col_width; 
+              const char *space = " ";
+              for (i = 0; i < n_spaces; i++) 
+                printf ("%s", space); 
+            }
+        }
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
@@ -1171,22 +1188,25 @@
 
       if (room < len)
 	{
-	  unsigned int i;
-
-	  for (i = 0; help[i]; i++)
+	  if (len != gcc_gettext_width (help))
+            len = get_aligned_len (help, room, remaining);
+	  else
 	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
+              for (i = 0; help[i]; i++)
+	        {
+	          if (i >= room && len != remaining)
+		    break;
+	          if (help[i] == ' ')
+		    len = i;
+	          else if ((help[i] == '-' || help[i] == '/')
+		           && help[i + 1] != ' '
+		           && i > 0 && ISALPHA (help[i - 1]))
+		    len = i + 1;
+	        }
 	    }
 	}
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
+      printf ( " %.*s\n", len, help); 
       item_width = 0;
       while (help[len] == ' ')
 	len++;
@@ -1242,6 +1262,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1299,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *)alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-15  9:28           ` Shujing Zhao
  2010-03-15 10:58             ` Shujing Zhao
@ 2010-03-15 11:42             ` Joseph S. Myers
  2010-03-17  7:57               ` Shujing Zhao
  1 sibling, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-15 11:42 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Mon, 15 Mar 2010, Shujing Zhao wrote:

> +/* Return the number of bytes when converts the wide character WC
> +   to its multibyte representation. */

"Return the number of bytes in the multibyte representation of the wide 
character WC.".

> +  char *pmb = (char *)xmalloc (MB_CUR_MAX);

Missing space between "(char *)" and "xmalloc".

> +/* Return the length of MSGSTR that can be printed within the width ROOM. */ 
> +
> +unsigned int
> +get_aligned_len (const char *msgstr, unsigned int room,
> +	         unsigned int remaining)

You need to say what REMAINING is.

> @@ -1158,12 +1158,31 @@ wrap_help (const char *help,

The comment at the top of this function should be changed to make explicit 
that ITEM_WIDTH is a count of bytes (if that is indeed the case).

> +      if (item_width == item_col_width
> +          || item_width == 0)
> +        printf ("  %-*.*s", col_width, item_width, item);
> +      /* Handle the item that includes wide character. */
> +      else

I think the code would be a lot clearer with fewer conditionals.  Why do 
you need to handle specially the case where item_width == item_col_width?  
Can't you just use the logic below unconditionally?

> +        {
> +          printf ("  %s", item);
> +          /* Output a certain number of spaces for alignment. */
> +          if (item_col_width < col_width) 
> +            { 
> +              unsigned int n_spaces = col_width - item_col_width; 
> +              const char *space = " ";
> +              for (i = 0; i < n_spaces; i++) 
> +                printf ("%s", space); 

You don't actually need a loop here (given a number of spaces 
representable as an "int", which I think you can assume); you can use 
"%*s" as a format with an empty string argument to have printf output the 
number of spaces you computed.

> +	  if (len != gcc_gettext_width (help))
> +            len = get_aligned_len (help, room, remaining);

Again, I'd like to avoid conditionals if possible.

> +              for (i = 0; help[i]; i++)
> +	        {
> +	          if (i >= room && len != remaining)
> +		    break;
> +	          if (help[i] == ' ')
> +		    len = i;
> +	          else if ((help[i] == '-' || help[i] == '/')
> +		           && help[i + 1] != ' '
> +		           && i > 0 && ISALPHA (help[i - 1]))
> +		    len = i + 1;
> +	        }

You now have two copies of line-splitting logic, one in get_aligned_len 
and one in opts.c.  It would be better for the logic to be just in 
get_aligned_len.  You might still need two copies - the one using wide 
character functions and a simpler one where they aren't available - but 
having both in the same file would be better than sometimes using logic 
from one file and sometimes from another file.  (This would mean that 
intl.h no longer has a dummy definition of get_aligned_len; there would 
always be a real function that works out a suitable point to break a 
line.)

> +	  buf = (char *)alloca (len + 1);

Missing space in cast again.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-15 11:42             ` Joseph S. Myers
@ 2010-03-17  7:57               ` Shujing Zhao
  2010-03-17 12:50                 ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-17  7:57 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On 03/15/2010 07:03 PM, Joseph S. Myers wrote:
> 
> "Return the number of bytes in the multibyte representation of the wide 
> character WC.".
> 
> 
> Missing space between "(char *)" and "xmalloc".
> 
> 
> You need to say what REMAINING is.
> 
>> @@ -1158,12 +1158,31 @@ wrap_help (const char *help,
> 
> The comment at the top of this function should be changed to make explicit 
> that ITEM_WIDTH is a count of bytes (if that is indeed the case).
Fixed at the updated patch. Thanks.
> 
>> +      if (item_width == item_col_width
>> +          || item_width == 0)
>> +        printf ("  %-*.*s", col_width, item_width, item);
>> +      /* Handle the item that includes wide character. */
>> +      else
> 
> I think the code would be a lot clearer with fewer conditionals.  Why do 
> you need to handle specially the case where item_width == item_col_width?  
> Can't you just use the logic below unconditionally?
The conditional item_width == item_col_width is used to handle the ITEM without 
wide characters. As your suggestion, I simplify the logic as

+      int n_spaces = item_width <= item_col_width
+                     ? col_width - item_width
+                     : col_width - item_col_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+        {
+          const char *space = " ";
+          printf ("%*s", n_spaces, space);
+        }

The conditional item_width <= item_col_width is still used to distinguish the 
ITEM with or without wide character. And it is still print the item and spaces 
respectively.
Is it better?
> 
>> +              for (i = 0; i < n_spaces; i++) 
>> +                printf ("%s", space); 
> 
> You don't actually need a loop here (given a number of spaces 
> representable as an "int", which I think you can assume); you can use 
> "%*s" as a format with an empty string argument to have printf output the 
> number of spaces you computed.
Thanks. Fixed.
> 
>> +	  if (len != gcc_gettext_width (help))
>> +            len = get_aligned_len (help, room, remaining);
> 
> Again, I'd like to avoid conditionals if possible.
After the logic is moved to intl.c, the conditional is still kept to tell 
calling get_str_aligned_len or get_wcs_aligned_len.

+  if (gcc_gettext_width (msgstr) == remaining)
+   return get_str_aligned_len (msgstr, room, remaining);
+  else
+   return get_wcs_aligned_len (msgstr, room, remaining);

Is it ok or any suggestion?
> 
> 
> You now have two copies of line-splitting logic, one in get_aligned_len 
> and one in opts.c.  It would be better for the logic to be just in 
> get_aligned_len.  You might still need two copies - the one using wide 
> character functions and a simpler one where they aren't available - but 
> having both in the same file would be better than sometimes using logic 
> from one file and sometimes from another file.  (This would mean that 
> intl.h no longer has a dummy definition of get_aligned_len; there would 
> always be a real function that works out a suitable point to break a 
> line.)
>
Move the two logics to intl.c and adjust the definition of get_aligned_len. 
Define function get_wcs_aligned_len to handle the wide character strings and 
fall back to single characters if --disable-nls or the relevant interfaces are 
not available.

>> +	  buf = (char *)alloca (len + 1);
> 
> Missing space in cast again.
> 
Fixed.

The algorithm of function get_wcs_aligned_len (the old get_aligned_len) is 
adjusted too. The ROOM is not be "expanded". It is counting the column width of 
the sub wide character string that would be printed, and then comparing with the 
ROOM it is given to know if the string is enough long to put in the ROOM. The 
wide character space is also added to check to break the line.

Thanks
Pearly

[-- Attachment #2: 03171200.patch --]
[-- Type: text/x-patch, Size: 7685 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157453)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157453)
+++ intl.c	(working copy)
@@ -40,6 +40,9 @@
 /* Whether the locale is using UTF-8.  */
 bool locale_utf8 = false;
 
+static unsigned int get_str_aligned_len (const char *, unsigned int,
+					 unsigned int);
+
 #ifdef ENABLE_NLS
 
 /* Initialize the translation library for GCC.  This performs the
@@ -92,6 +95,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +110,108 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+static wchar_t *
+get_sub_wstr (wchar_t *wstr, int len)
+{
+  wchar_t *buf;
+  buf = XNEWVEC (wchar_t, len + 1);
+  wcsncpy (buf, wstr, len);
+  buf [len] = L'\0';
+  return buf;
+}
+
+/* Return the number of bytes in the multibyte representation of the wide 
+   character WC. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *) xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Return the length of multibyte string MSGSTR that can be printed within the
+   width ROOM. REMAINING is the length of string MSGSTR counted by bytes. */ 
+
+static unsigned int
+get_wcs_aligned_len (const char *msgstr, unsigned int room,
+	         unsigned int remaining)
+{
+  int nbytes;
+  wchar_t *sub_wstr;
+  unsigned int i;
+  unsigned int maybe_width;
+  size_t remaining_width;
+ 
+  unsigned int len = 0;
+  unsigned int pos = 0;
+  unsigned int colwidth = 0;
+
+  int len_bk = len;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  remaining_width = wcswidth (wmsgstr, nwcs);
+  
+  for (i = 0; i < nwcs; i++)
+    {
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      maybe_width = colwidth + i - pos;
+      if (maybe_width == remaining_width)
+        len = remaining;
+      else if (colwidth == room || maybe_width >= room)
+        break;
+      else if (colwidth > room)
+        {
+          len = len_bk;
+          break;
+        }
+      else if (iswspace (wmsgstr [i]))
+        {
+          sub_wstr = get_sub_wstr (wmsgstr, i);
+          len_bk = len;
+          len = wcstombs (0, sub_wstr, 0);
+          colwidth = maybe_width;
+          pos = i;
+          free (sub_wstr);
+        }
+      else if (nbytes > 1)
+        {
+          sub_wstr = get_sub_wstr (wmsgstr, i + 1);
+          len_bk = len;
+          len = wcstombs (0, sub_wstr, 0);
+          colwidth = wcswidth (sub_wstr, i + 1);
+          free (sub_wstr);
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              nbytes = wctomb_nbytes (wmsgstr [i]);
+	      if (nbytes > 1)
+		{
+                  sub_wstr = get_sub_wstr (wmsgstr, i + 1);
+                  len = wcstombs (0, sub_wstr, 0);
+	          colwidth = wcswidth (sub_wstr, i + 1);
+                  free (sub_wstr);
+		}
+	      else
+                {
+		  len = len + 1;
+                  colwidth = colwidth + 1;
+                }
+            }
+          pos = i;
+        }
+    } 
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -117,6 +223,13 @@
   return strlen (msgstr);
 }
 
+static unsigned int
+get_wcs_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+   return get_str_aligned_len (msgstr, room, remaining);
+}
+
 #endif
 
 #endif /* ENABLE_NLS */
@@ -132,6 +245,13 @@
   return plural;
 }
 
+static unsigned int
+get_wcs_aligned_len (const char *msgstr, unsigned int room,
+                     unsigned int remaining)
+{
+  return get_str_aligned_len (msgstr, room, remaining);
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
@@ -148,5 +268,36 @@
    return spaces;
 }
 
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
 
+static unsigned int
+get_str_aligned_len (const char *msgstr, unsigned int room,
+                     unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+        break;
+      if (msgstr[i] == ' ')
+        len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+               && msgstr[i + 1] != ' '
+               && i > 0 && ISALPHA (msgstr[i - 1]))
+        len = i + 1;
+    }
+  return len;
+}
 
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  if (gcc_gettext_width (msgstr) == remaining)
+   return get_str_aligned_len (msgstr, room, remaining);
+  else
+   return get_wcs_aligned_len (msgstr, room, remaining);    
+}
+
Index: opts.c
===================================================================
--- opts.c	(revision 157453)
+++ opts.c	(working copy)
@@ -1149,7 +1149,7 @@
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
+/* Output ITEM, of length ITEM_WIDTH counted by bytes, in the left column,
    followed by word-wrapped HELP in a second column.  */
 static void
 wrap_help (const char *help,
@@ -1158,36 +1158,34 @@
 	   unsigned int columns)
 {
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int remaining, room, len, item_col_width;
 
   remaining = strlen (help);
+  item_col_width = gcc_gettext_width (item); 
 
   do
     {
+      int n_spaces = item_width <= item_col_width 
+                     ? col_width - item_width
+                     : col_width - item_col_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+        {
+          const char *space = " ";
+          printf ("%*s", n_spaces, space);
+        } 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
+        len = get_aligned_len (help, room, remaining);
 
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
-
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1242,6 +1240,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1277,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-17  7:57               ` Shujing Zhao
@ 2010-03-17 12:50                 ` Joseph S. Myers
  2010-03-18 11:48                   ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-17 12:50 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Wed, 17 Mar 2010, Shujing Zhao wrote:

> > > +      if (item_width == item_col_width
> > > +          || item_width == 0)
> > > +        printf ("  %-*.*s", col_width, item_width, item);
> > > +      /* Handle the item that includes wide character. */
> > > +      else
> > 
> > I think the code would be a lot clearer with fewer conditionals.  Why do you
> > need to handle specially the case where item_width == item_col_width?  Can't
> > you just use the logic below unconditionally?
> The conditional item_width == item_col_width is used to handle the ITEM
> without wide characters. As your suggestion, I simplify the logic as

But you should be able to treat all ITEMs as if they have wide characters; 
there should be no need to do something special for the case of no wide 
characters.

> > > +	  if (len != gcc_gettext_width (help))
> > > +            len = get_aligned_len (help, room, remaining);
> > 
> > Again, I'd like to avoid conditionals if possible.
> After the logic is moved to intl.c, the conditional is still kept to tell
> calling get_str_aligned_len or get_wcs_aligned_len.
> 
> +  if (gcc_gettext_width (msgstr) == remaining)
> +   return get_str_aligned_len (msgstr, room, remaining);
> +  else
> +   return get_wcs_aligned_len (msgstr, room, remaining);
> 
> Is it ok or any suggestion?

The check still isn't appropriate.  The width might equal the number of 
bytes even when wide characters are present.  I think the logic should be: 
if NLS enabled and the relevant wide character functions available, use 
get_wcs_aligned_len, otherwise use get_str_aligned_len.  This means only 
one of those functions is used in any compiler configuration, so you can 
just make them two different implementations of get_aligned_len depending 
on the configuration, rather than having three function names involved.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-17 12:50                 ` Joseph S. Myers
@ 2010-03-18 11:48                   ` Shujing Zhao
  2010-03-18 17:54                     ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-18 11:48 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

On 03/17/2010 08:04 PM, Joseph S. Myers wrote:
> 
> But you should be able to treat all ITEMs as if they have wide characters; 
> there should be no need to do something special for the case of no wide 
> characters.
> 
> The check still isn't appropriate.  The width might equal the number of 
> bytes even when wide characters are present.  I think the logic should be: 
> if NLS enabled and the relevant wide character functions available, use 
> get_wcs_aligned_len, otherwise use get_str_aligned_len.  This means only 
> one of those functions is used in any compiler configuration, so you can 
> just make them two different implementations of get_aligned_len depending 
> on the configuration, rather than having three function names involved.
> 
Thanks. I merge the algorithm of line break in function get_aligned_len and 
don't care if it includes wide characters. Yes, not only the code, but also the 
logic is clearer.
I add a separate conditional directive to get the aligned length when 
--diable-nls or the relevant interfaces are not available. The definition of 
function fake_ngettext is moved to the inclusions of ENABLE_NLS, not used a 
separate "ifndef ENABLE_NLS". That would use less conditionals at intl.c.

Tested on i686-pc-linux and enabled nls, the help output isn't changed when the 
locale is C. Disabled nls, the output isn't changed too.

Thanks
Pearly

[-- Attachment #2: 03181811.patch --]
[-- Type: text/x-patch, Size: 6917 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157508)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+                                     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157508)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,102 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+static wchar_t *
+get_sub_wstr (wchar_t *wstr, int len)
+{
+  wchar_t *buf;
+  buf = XNEWVEC (wchar_t, len + 1);
+  wcsncpy (buf, wstr, len);
+  buf [len] = L'\0';
+  return buf;
+}
+
+/* Return the number of bytes in the multibyte representation of the wide 
+   character WC. */
+
+static int
+wctomb_nbytes (wchar_t wc)
+{
+  char *pmb = (char *) xmalloc (MB_CUR_MAX);
+  int nbytes = wctomb (pmb, wc);
+  if (pmb)
+    free (pmb);
+  return nbytes;
+}
+
+/* Return the length of multibyte string MSGSTR that can be printed within the
+   width ROOM. REMAINING is the length of string MSGSTR counted by bytes. */ 
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+	         unsigned int remaining)
+{
+  int nbytes;
+  wchar_t *sub_wstr;
+  unsigned int i;
+  size_t remaining_width;
+ 
+  unsigned int len = 0;
+  unsigned int colwidth = 0;
+
+  int len_bk;
+  
+  size_t nwcs = mbstowcs (0, msgstr, 0);
+  wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1);
+
+  mbstowcs (wmsgstr, msgstr, nwcs + 1);
+  remaining_width = wcswidth (wmsgstr, nwcs);
+  
+  for (i = 0; i < nwcs; i++)
+    { 
+      nbytes = wctomb_nbytes (wmsgstr [i]);
+      colwidth ++;
+
+      if (colwidth == remaining_width)
+        len = remaining;
+      else if (iswspace (wmsgstr [i]))
+        {
+          sub_wstr = get_sub_wstr (wmsgstr, i);
+          len_bk = len = wcstombs (0, sub_wstr, 0);
+          free (sub_wstr);
+        }
+      else if ((wmsgstr[i] == L'-' || wmsgstr[i] == L'/')
+               && msgstr[i + 1] != L' '
+               && i > 0 && iswalpha (msgstr[i - 1]))
+        {
+	  sub_wstr = get_sub_wstr (wmsgstr, i + 1);
+          len_bk = len = wcstombs (0, sub_wstr, 0);
+          colwidth = wcswidth (sub_wstr, i + 1);
+          free (sub_wstr);
+        }
+      else if (nbytes > 1)
+        {
+          sub_wstr = get_sub_wstr (wmsgstr, i + 1);
+          len_bk = len;
+          len = wcstombs (0, sub_wstr, 0);
+          colwidth = wcswidth (sub_wstr, i + 1);
+	  free (sub_wstr);
+	  /* Keep the puncts printed with the previous word. */
+          if (iswpunct (wmsgstr [i + 1])
+	      && iswalpha (wmsgstr [i -1]))
+            {
+              i++;
+              sub_wstr = get_sub_wstr (wmsgstr, i + 1);
+              len = wcstombs (0, sub_wstr, 0);
+	      colwidth = wcswidth (sub_wstr, i + 1);
+              free (sub_wstr);
+            }
+        }
+        if (colwidth >= room)
+          {
+	    if (colwidth > room)	
+              len = len_bk;
+            break;
+	  }
+    } 
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,10 +216,8 @@
 
 #endif
 
-#endif /* ENABLE_NLS */
+#else /* Not defined ENABLE_NLS */
 
-#ifndef ENABLE_NLS
-
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
 {
@@ -132,6 +227,35 @@
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+                               && defined HAVE_WORKING_MBSTOWCS \
+                               && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+                 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+        break;
+      if (msgstr[i] == ' ')
+        len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+               && msgstr[i + 1] != ' '
+               && i > 0 && ISALPHA (msgstr[i - 1]))
+        len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 157508)
+++ opts.c	(working copy)
@@ -1149,7 +1149,7 @@
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
+/* Output ITEM, of length ITEM_WIDTH counted by bytes, in the left column,
    followed by word-wrapped HELP in a second column.  */
 static void
 wrap_help (const char *help,
@@ -1158,36 +1158,34 @@
 	   unsigned int columns)
 {
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int remaining, room, len, item_col_width;
 
   remaining = strlen (help);
+  item_col_width = gcc_gettext_width (item); 
 
   do
     {
+      int n_spaces = item_width <= item_col_width 
+                     ? col_width - item_width
+                     : col_width - item_col_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+        {
+          const char *space = " ";
+          printf ("%*s", n_spaces, space);
+        } 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
+        len = get_aligned_len (help, room, remaining);
 
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
-
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1242,6 +1240,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1277,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-18 11:48                   ` Shujing Zhao
@ 2010-03-18 17:54                     ` Joseph S. Myers
  2010-03-19 11:43                       ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-18 17:54 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Thu, 18 Mar 2010, Shujing Zhao wrote:

>  
> +static wchar_t *
> +get_sub_wstr (wchar_t *wstr, int len)

Needs comment explaining semantics of parameters and return value.

> +/* Return the number of bytes in the multibyte representation of the wide 
> +   character WC. */
> +
> +static int
> +wctomb_nbytes (wchar_t wc)
> +{

This code is heavily assuming that converting from multibyte to wide 
characters and back again yields exactly the same multibyte characters.  
This is more likely in the Unicode world than it used to be, but it's not 
clear we should be presuming it.  I think code without this presumption 
would be simpler than code with it; rather than converting the whole 
string at once and then repeatedly extracting substrings, you'd convert 
one character at a time, keeping track of the width at each point 
(computing the width of one character at a time) and of the byte count to 
the last possible point for a line break.  (That way, you could also 
easily allow for encodings with shift sequences, keeping an mbstate_t in 
the course of processing the string.)

> +/* Return the length of multibyte string MSGSTR that can be printed within the

"length" = number of bytes?

> +    { 
> +      nbytes = wctomb_nbytes (wmsgstr [i]);
> +      colwidth ++;

Why this increment?  Characters may have width 0 (combining characters) or 
more than 1 and it's not clear why adding 1 should be the correct thing to 
do here.

> +      else if (nbytes > 1)

Why this conditional?  Whatever the number of bytes is, you need to update 
the width appropriately.

> +          {
> +	    if (colwidth > room)	
> +              len = len_bk;
> +            break;

Indentation seems very confused.  Make sure that new code consistently 
uses TABs for indenting source code lines.

> +      int n_spaces = item_width <= item_col_width 
> +                     ? col_width - item_width
> +                     : col_width - item_col_width;

I don't see why this conditional is here.  For printing spaces, it's 
always the number of columns that's relevant; the number of bytes is 
irrelevant.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-18 17:54                     ` Joseph S. Myers
@ 2010-03-19 11:43                       ` Shujing Zhao
  2010-03-19 13:44                         ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-19 11:43 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On 03/19/2010 01:03 AM, Joseph S. Myers wrote:
> This code is heavily assuming that converting from multibyte to wide 
> characters and back again yields exactly the same multibyte characters.  
> This is more likely in the Unicode world than it used to be, but it's not 
> clear we should be presuming it.  I think code without this presumption 
> would be simpler than code with it; rather than converting the whole 
> string at once and then repeatedly extracting substrings, you'd convert 
> one character at a time, keeping track of the width at each point 
> (computing the width of one character at a time) and of the byte count to 
> the last possible point for a line break.  (That way, you could also 
> easily allow for encodings with shift sequences, keeping an mbstate_t in 
> the course of processing the string.)
Yes! Thanks again! The source codes are clearer and simpler while choose your 
method to convert one character at a time. All static functions I have added are 
  removed. Only the main function get_aligned_len are left.
>> +    { 
>> +      nbytes = wctomb_nbytes (wmsgstr [i]);
>> +      colwidth ++;
> 
> Why this increment?  Characters may have width 0 (combining characters) or 
> more than 1 and it's not clear why adding 1 should be the correct thing to 
> do here.
It is changed to *length--* at the new logic.
+      colwidth += wcwidth (wc[0]);
+      length--;
+      if (length == 0)
+       len = remaining;
I have tried to used length = length - nbytes to get the new length. But if 
nbytes > 1 and the character is the last character of the string (length will be 
0 if it decreases nbytes), it would not assign the last variable *len* to 
*len_bk*, then the character that maybe put into the ROOM would not be printed. 
That is to say, it presumes the nbytes is 1. If (nbytes > 1), it would be 
handled specially (just to update the value of len_bk). At the other 
conditionals, len_bk is always kept consistently with len.
> 
>> +      else if (nbytes > 1)
> 
> Why this conditional?  Whatever the number of bytes is, you need to update 
> the width appropriately.
At the new logic, the width is always be updated every time when convert one 
character.
> 
> Indentation seems very confused.  Make sure that new code consistently 
> uses TABs for indenting source code lines.
Thanks again. All the lines of the updated patch is checked to use TABs for 
indenting.
> 
>> +      int n_spaces = item_width <= item_col_width 
>> +                     ? col_width - item_width
>> +                     : col_width - item_col_width;
> 
> I don't see why this conditional is here.  For printing spaces, it's 
> always the number of columns that's relevant; the number of bytes is 
> irrelevant.
Yes! You are right! The gcc_gettext_width would always return the width whatever 
the string is. It seems the old argument ITEM_WIDTH is not needed anymore. I 
removed it from the function wrap_help.

Is it ok?

Thanks
Pearly

[-- Attachment #2: 03191800.patch --]
[-- Type: text/x-patch, Size: 6082 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157508)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157508)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,73 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Return the number of bytes of multibyte string MSGSTR 
+   that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */ 
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1];
+ 
+  size_t pre_nbytes;
+  wchar_t pre_wc;
+
+  unsigned int len = 0;
+  unsigned int colwidth = 0;
+  unsigned int len_bk;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+  
+  int length = remaining;
+  
+  while ((nbytes = mbrtowc (wc, msgstr, length, &state)) > 0)
+    {
+      if (nbytes >= (size_t) -2)
+	break;
+      colwidth += wcwidth (wc[0]);
+      length--;
+      if (length == 0)
+	len = remaining;
+      else if (iswspace (wc[0]))
+	len_bk = len = remaining - length - nbytes;
+      else  if ((wc[0] == L'-' || wc[0] == L'/')
+	        && iswalpha (pre_wc))
+	len_bk = len = remaining - length;
+      else if (nbytes > 1)
+	{
+	  len_bk = len;
+	  length = length - nbytes + 1;
+	  len = remaining - length;
+	  pre_wc = wc[0];
+	  pre_nbytes = nbytes;
+	  if ((nbytes = mbrtowc (wc, msgstr + nbytes, length, &state)) > 0)
+	    {
+	      if (iswpunct (wc[0]) && iswalpha (pre_wc))
+		{
+		  len = remaining - length + nbytes;
+		  colwidth += wcwidth(wc[0]);
+		  msgstr += nbytes;
+		  length -= nbytes;
+		}
+	      nbytes = pre_nbytes;
+	    }
+	}
+      if (colwidth >= room)
+	{
+	  if (colwidth > room)
+	    len = len_bk;
+	  break;
+	}
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  return len;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,10 +187,8 @@
 
 #endif
 
-#endif /* ENABLE_NLS */
+#else /* Not defined ENABLE_NLS */
 
-#ifndef ENABLE_NLS
-
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
 {
@@ -132,6 +198,35 @@
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 157508)
+++ opts.c	(working copy)
@@ -1149,45 +1149,40 @@
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column, followed by
+   word-wrapped HELP in a second column.  */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
-  unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  unsigned int col_width = LEFT_COLUMN;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
+	len = get_aligned_len (help, room, remaining);
 
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
-
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1226,7 +1221,7 @@
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1242,6 +1237,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1274,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1319,7 +1318,7 @@
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-19 11:43                       ` Shujing Zhao
@ 2010-03-19 13:44                         ` Joseph S. Myers
  2010-03-22  9:17                           ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-03-19 13:44 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Fri, 19 Mar 2010, Shujing Zhao wrote:

> > Why this increment?  Characters may have width 0 (combining characters) or
> > more than 1 and it's not clear why adding 1 should be the correct thing to
> > do here.
> It is changed to *length--* at the new logic.
> +      colwidth += wcwidth (wc[0]);
> +      length--;
> +      if (length == 0)
> +       len = remaining;
> I have tried to used length = length - nbytes to get the new length. But if
> nbytes > 1 and the character is the last character of the string (length will
> be 0 if it decreases nbytes), it would not assign the last variable *len* to
> *len_bk*, then the character that maybe put into the ROOM would not be
> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it
> would be handled specially (just to update the value of len_bk). At the other
> conditionals, len_bk is always kept consistently with len.

I'm afraid none of this makes any sense to me, because I have no idea what 
"length", "len" and "len_bk" are meant to be, and how they differ.  You 
need clearer variable names.  I think you also need comments at various 
points stating what the values various variables mean at those points, and 
maybe an overview comment about the algorithm used.  It would be best to 
avoid ambiguous terms such as "length" at all points in variable names and 
comments, instead talking about numbers of bytes difference between two 
points, or similar.

For example, the present algorithm in opts.c:wrap_help might be described 
thus:

/* Look for a suitable point at which to break HELP.  If there is one 
   after at most ROOM columns, take the latest such point.  Otherwise, 
   take the first such point in the string, or the end of the string if it 
   cannot be broken.  */

If the new algorithm is still like that, you may just need to update 
variable names (and explain what the interface is for indicating both how 
much of the string to print, and how much (spaces) to skip before starting 
on the next line).

The key variables to explain are LEN and I.  At each iteration in the 
present loop, something like this might explain them.

/* The best candidate point found so far to break HELP is after LEN 
   bytes, where either LEN < I, or LEN == REMAINING if the string cannot 
   be broken within the first I bytes.  If we are outside ROOM columns and 
   have found somewhere to break the string, that is the best point 
   available; otherwise, breaking after I bytes will be better if that is 
   a valid point to break.  */

Now, things will be a bit more complicated in that you need to track the 
current column width as well as the current number of bytes of the string 
that take up that width.  But I think the basic idea is still the same - 
it is just that "length", "len" and "len_bk" are not useful variable names 
to distinguish the different quantities being tracked and so make clear 
whether particular arithmetic you are doing on them makes logical sense.  
I remain very suspicious of anything modifying a byte counter by 1 in this 
context; if you are ever modifying a byte counter by something other than 
the number of bytes in the relevant multibyte character I think you are 
confused and need to sort out the logic of the code so you don't need to 
do so.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-19 13:44                         ` Joseph S. Myers
@ 2010-03-22  9:17                           ` Shujing Zhao
  2010-03-31  7:03                             ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-22  9:17 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 4299 bytes --]

On 03/19/2010 09:26 PM, Joseph S. Myers wrote:
> On Fri, 19 Mar 2010, Shujing Zhao wrote:
> 
>>> Why this increment?  Characters may have width 0 (combining characters) or
>>> more than 1 and it's not clear why adding 1 should be the correct thing to
>>> do here.
>> It is changed to *length--* at the new logic.
>> +      colwidth += wcwidth (wc[0]);
>> +      length--;
>> +      if (length == 0)
>> +       len = remaining;
>> I have tried to used length = length - nbytes to get the new length. But if
>> nbytes > 1 and the character is the last character of the string (length will
>> be 0 if it decreases nbytes), it would not assign the last variable *len* to
>> *len_bk*, then the character that maybe put into the ROOM would not be
>> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it
>> would be handled specially (just to update the value of len_bk). At the other
>> conditionals, len_bk is always kept consistently with len.
> 
> I'm afraid none of this makes any sense to me, because I have no idea what 
> "length", "len" and "len_bk" are meant to be, and how they differ.  You 
> need clearer variable names.  I think you also need comments at various 
> points stating what the values various variables mean at those points, and 
> maybe an overview comment about the algorithm used.  It would be best to 
> avoid ambiguous terms such as "length" at all points in variable names and 
> comments, instead talking about numbers of bytes difference between two 
> points, or similar.
> 
> For example, the present algorithm in opts.c:wrap_help might be described 
> thus:
> 
> /* Look for a suitable point at which to break HELP.  If there is one 
>    after at most ROOM columns, take the latest such point.  Otherwise, 
>    take the first such point in the string, or the end of the string if it 
>    cannot be broken.  */
> 
> If the new algorithm is still like that, you may just need to update 
> variable names (and explain what the interface is for indicating both how 
> much of the string to print, and how much (spaces) to skip before starting 
> on the next line).
> 
> The key variables to explain are LEN and I.  At each iteration in the 
> present loop, something like this might explain them.
> 
> /* The best candidate point found so far to break HELP is after LEN 
>    bytes, where either LEN < I, or LEN == REMAINING if the string cannot 
>    be broken within the first I bytes.  If we are outside ROOM columns and 
>    have found somewhere to break the string, that is the best point 
>    available; otherwise, breaking after I bytes will be better if that is 
>    a valid point to break.  */
> 
> Now, things will be a bit more complicated in that you need to track the 
> current column width as well as the current number of bytes of the string 
> that take up that width.  But I think the basic idea is still the same - 
> it is just that "length", "len" and "len_bk" are not useful variable names 
> to distinguish the different quantities being tracked and so make clear 
> whether particular arithmetic you are doing on them makes logical sense.  
Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of bytes that 
have been found can be printed at ROOM columns, "len" to "cur_len_to_print" as 
the possible number of bytes that can be printed at ROOM columns, "length" to 
left_len as the number of bytes that MSGSTR left to convert to wide character.
Some comments are also added as your suggestion.

> I remain very suspicious of anything modifying a byte counter by 1 in this 
> context; if you are ever modifying a byte counter by something other than 
> the number of bytes in the relevant multibyte character I think you are 
> confused and need to sort out the logic of the code so you don't need to 
> do so.
Thanks. You are right. I found the way to solve the problem that I have 
mentioned when I explain why I have to use "length--". It just needs to store 
the previous point that can be break the string before the new point is set to 
the end of the string.

+      left_len -= nbytes;
+
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+       {
+         pre_len_to_print = cur_len_to_print;
+         cur_len_to_print = remaining;
+       }

Is it Ok?

Thanks
Pearly

[-- Attachment #2: 0322.patch --]
[-- Type: text/x-patch, Size: 7982 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157508)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157508)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,99 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Look for a suitable point at which to break string MSGSTR. If there is one 
+   after at most ROOM columns, take the latest such point. Otherwise, 
+   take the first such point in the string, or the end of the string if it 
+   cannot be broken.
+   REMAINING is the number of bytes of MSGSTR.
+   Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1]; 
+  size_t pre_nbytes;
+  wchar_t pre_wc;
+
+  unsigned int colwidth = 0;
+
+  /* The number of bytes that have been found can be printed at ROOM columns. */
+  unsigned int pre_len_to_print;
+
+  /* The possible number of bytes that can be printed at ROOM columns. */
+  unsigned int cur_len_to_print = 0;
+
+  /* The number of bytes that MSGSTR left to convert to wide character. */
+  int left_len = remaining;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+
+  /* Convert one character at a time, keep track of the width at each point 
+     and of the byte count to the last possible point for a line break. The 
+     best candidate point found so far to break MSGSTR is after the space,
+     some puncts or multibyte character with the number of bytes
+     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere
+     PRE_LEN_TO_PRINT to break the string, that is the best point available;
+     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
+     is a valid point to break. */ 
+  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
+    {
+      if (nbytes >= (size_t) -2)
+	/* Invalide input string. */
+	break;
+      
+      colwidth += wcwidth (wc[0]);
+      left_len -= nbytes;
+
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining;
+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
+      else  if ((wc[0] == L'-' || wc[0] == L'/')
+	        && iswalpha (pre_wc))
+	pre_len_to_print = cur_len_to_print = remaining - left_len;
+      else if (nbytes > 1)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining - left_len;
+	  pre_wc = wc[0];
+	  pre_nbytes = nbytes;
+	  /* Keep the puncts printed with the previous word. */
+	  if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0)
+	    {
+	      if (nbytes >= (size_t) -2)
+		break;
+	      if (iswpunct (wc[0]) && iswalpha (pre_wc))
+		{
+		  cur_len_to_print = remaining - left_len + nbytes;
+		  colwidth += wcwidth (wc[0]);
+		  msgstr += nbytes;
+		  left_len -= nbytes;
+		}
+	      nbytes = pre_nbytes;
+	    }
+	}
+
+      if (colwidth >= room)
+	{
+	  /* If the string is outside ROOM columns after counted the last
+	     word, the break point will be set to the previous point it kept. */
+	  if (colwidth > room)
+	    cur_len_to_print = pre_len_to_print;
+	  break;
+	}
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  return cur_len_to_print;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,10 +213,8 @@
 
 #endif
 
-#endif /* ENABLE_NLS */
+#else /* Not defined ENABLE_NLS */
 
-#ifndef ENABLE_NLS
-
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
 {
@@ -132,6 +224,35 @@
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 157508)
+++ opts.c	(working copy)
@@ -1149,45 +1149,45 @@
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column and HELP in the second column with the total
+   COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+   corresponding number of spaces for alignment. If the ROOM is less than
+   REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+   take the end of the string.
+   If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+   the left column from the secode line. */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
+  unsigned int remaining, room, len;
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
+	len = get_aligned_len (help, room, remaining);
 
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
-
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1226,7 +1226,7 @@
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1242,6 +1242,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1279,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1319,7 +1323,7 @@
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-22  9:17                           ` Shujing Zhao
@ 2010-03-31  7:03                             ` Shujing Zhao
  2010-04-06 16:56                               ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-03-31  7:03 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: Joseph S. Myers, GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 4524 bytes --]

On 03/22/2010 04:41 PM, Shujing Zhao wrote:
> On 03/19/2010 09:26 PM, Joseph S. Myers wrote:
>> On Fri, 19 Mar 2010, Shujing Zhao wrote:
>>
>>>> Why this increment?  Characters may have width 0 (combining 
>>>> characters) or
>>>> more than 1 and it's not clear why adding 1 should be the correct 
>>>> thing to
>>>> do here.
>>> It is changed to *length--* at the new logic.
>>> +      colwidth += wcwidth (wc[0]);
>>> +      length--;
>>> +      if (length == 0)
>>> +       len = remaining;
>>> I have tried to used length = length - nbytes to get the new length. 
>>> But if
>>> nbytes > 1 and the character is the last character of the string 
>>> (length will
>>> be 0 if it decreases nbytes), it would not assign the last variable 
>>> *len* to
>>> *len_bk*, then the character that maybe put into the ROOM would not be
>>> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 
>>> 1), it
>>> would be handled specially (just to update the value of len_bk). At 
>>> the other
>>> conditionals, len_bk is always kept consistently with len.
>>
>> I'm afraid none of this makes any sense to me, because I have no idea 
>> what "length", "len" and "len_bk" are meant to be, and how they 
>> differ.  You need clearer variable names.  I think you also need 
>> comments at various points stating what the values various variables 
>> mean at those points, and maybe an overview comment about the 
>> algorithm used.  It would be best to avoid ambiguous terms such as 
>> "length" at all points in variable names and comments, instead talking 
>> about numbers of bytes difference between two points, or similar.
>>
>> For example, the present algorithm in opts.c:wrap_help might be 
>> described thus:
>>
>> /* Look for a suitable point at which to break HELP.  If there is one 
>>    after at most ROOM columns, take the latest such point.  Otherwise, 
>>    take the first such point in the string, or the end of the string 
>> if it    cannot be broken.  */
>>
>> If the new algorithm is still like that, you may just need to update 
>> variable names (and explain what the interface is for indicating both 
>> how much of the string to print, and how much (spaces) to skip before 
>> starting on the next line).
>>
>> The key variables to explain are LEN and I.  At each iteration in the 
>> present loop, something like this might explain them.
>>
>> /* The best candidate point found so far to break HELP is after LEN    
>> bytes, where either LEN < I, or LEN == REMAINING if the string cannot 
>>    be broken within the first I bytes.  If we are outside ROOM columns 
>> and    have found somewhere to break the string, that is the best 
>> point    available; otherwise, breaking after I bytes will be better 
>> if that is    a valid point to break.  */
>>
>> Now, things will be a bit more complicated in that you need to track 
>> the current column width as well as the current number of bytes of the 
>> string that take up that width.  But I think the basic idea is still 
>> the same - it is just that "length", "len" and "len_bk" are not useful 
>> variable names to distinguish the different quantities being tracked 
>> and so make clear whether particular arithmetic you are doing on them 
>> makes logical sense.  
> Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of 
> bytes that have been found can be printed at ROOM columns, "len" to 
> "cur_len_to_print" as the possible number of bytes that can be printed 
> at ROOM columns, "length" to left_len as the number of bytes that MSGSTR 
> left to convert to wide character.
> Some comments are also added as your suggestion.
> 
>> I remain very suspicious of anything modifying a byte counter by 1 in 
>> this context; if you are ever modifying a byte counter by something 
>> other than the number of bytes in the relevant multibyte character I 
>> think you are confused and need to sort out the logic of the code so 
>> you don't need to do so.
> Thanks. You are right. I found the way to solve the problem that I have 
> mentioned when I explain why I have to use "length--". It just needs to 
> store the previous point that can be break the string before the new 
> point is set to the end of the string.
> 
> +      left_len -= nbytes;
> +
> +      /* Compute the point to break the string. */
> +      if (left_len == 0)
> +       {
> +         pre_len_to_print = cur_len_to_print;
> +         cur_len_to_print = remaining;
> +       }
> 
> Is it Ok?
> 
> Thanks
> Pearly
> 
Added the ChangeLog.
Is it OK?

[-- Attachment #2: ChangeLog --]
[-- Type: text/plain, Size: 393 bytes --]

2010-03-31  Shujing Zhao  <pearly.zhao@oracle.com>

	* intl.c (get_aligned_len): New function.
	* intl.h (get_aligned_len): New prototype.
	* opts.c (wrap_help): Remove the ITEM_WIDTH parameter. Print the
	option name and help text respectively and add spaces between them
	when needed.
	(print_filtered_help): Split the help string to option name and help
	text and adjust wrap_help caller.


[-- Attachment #3: 0331.patch --]
[-- Type: text/x-patch, Size: 8370 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 157720)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@ extern const char *fake_ngettext(const c
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157720)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,99 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Look for a suitable point at which to break string MSGSTR. If there is one 
+   after at most ROOM columns, take the latest such point. Otherwise, 
+   take the first such point in the string, or the end of the string if it 
+   cannot be broken.
+   REMAINING is the number of bytes of MSGSTR.
+   Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1]; 
+  size_t pre_nbytes;
+  wchar_t pre_wc;
+
+  unsigned int colwidth = 0;
+
+  /* The number of bytes that have been found can be printed at ROOM columns. */
+  unsigned int pre_len_to_print;
+
+  /* The possible number of bytes that can be printed at ROOM columns. */
+  unsigned int cur_len_to_print = 0;
+
+  /* The number of bytes that MSGSTR left to convert to wide character. */
+  int left_len = remaining;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+
+  /* Convert one character at a time, keep track of the width at each point 
+     and of the byte count to the last possible point for a line break. The 
+     best candidate point found so far to break MSGSTR is after the space,
+     some puncts or multibyte character with the number of bytes
+     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere
+     PRE_LEN_TO_PRINT to break the string, that is the best point available;
+     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
+     is a valid point to break. */ 
+  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
+    {
+      if (nbytes >= (size_t) -2)
+	/* Invalide input string. */
+	break;
+      
+      colwidth += wcwidth (wc[0]);
+      left_len -= nbytes;
+
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining;
+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
+      else  if ((wc[0] == L'-' || wc[0] == L'/')
+	        && iswalpha (pre_wc))
+	pre_len_to_print = cur_len_to_print = remaining - left_len;
+      else if (nbytes > 1)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining - left_len;
+	  pre_wc = wc[0];
+	  pre_nbytes = nbytes;
+	  /* Keep the puncts printed with the previous word. */
+	  if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0)
+	    {
+	      if (nbytes >= (size_t) -2)
+		break;
+	      if (iswpunct (wc[0]) && iswalpha (pre_wc))
+		{
+		  cur_len_to_print = remaining - left_len + nbytes;
+		  colwidth += wcwidth (wc[0]);
+		  msgstr += nbytes;
+		  left_len -= nbytes;
+		}
+	      nbytes = pre_nbytes;
+	    }
+	}
+
+      if (colwidth >= room)
+	{
+	  /* If the string is outside ROOM columns after counted the last
+	     word, the break point will be set to the previous point it kept. */
+	  if (colwidth > room)
+	    cur_len_to_print = pre_len_to_print;
+	  break;
+	}
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  return cur_len_to_print;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,9 +213,7 @@ gcc_gettext_width (const char *msgstr)
 
 #endif
 
-#endif /* ENABLE_NLS */
-
-#ifndef ENABLE_NLS
+#else /* Not defined ENABLE_NLS */
 
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
@@ -132,6 +224,35 @@ fake_ngettext (const char *singular, con
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 157720)
+++ opts.c	(working copy)
@@ -1149,45 +1149,45 @@ decode_options (unsigned int argc, const
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column and HELP in the second column with the total
+   COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+   corresponding number of spaces for alignment. If the ROOM is less than
+   REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+   take the end of the string.
+   If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+   the left column from the secode line. */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
-  unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  unsigned int col_width = LEFT_COLUMN;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
-
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
+	len = get_aligned_len (help, room, remaining);
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1226,7 +1226,7 @@ print_filtered_help (unsigned int includ
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1242,6 +1242,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1279,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1319,7 +1323,7 @@ print_filtered_help (unsigned int includ
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-03-31  7:03                             ` Shujing Zhao
@ 2010-04-06 16:56                               ` Joseph S. Myers
  2010-04-13 11:32                                 ` Shujing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph S. Myers @ 2010-04-06 16:56 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Wed, 31 Mar 2010, Shujing Zhao wrote:

> +  unsigned int colwidth = 0;

No comment on this variable.

> +  /* The number of bytes that have been found can be printed at ROOM columns. */
> +  unsigned int pre_len_to_print;

The semantics of this variable seem confusing to me.  There isn't a clear 
difference between "have been found can be printed" and "can be printed".

> +  /* The possible number of bytes that can be printed at ROOM columns. */
> +  unsigned int cur_len_to_print = 0;

I think you mean something like "The number of bytes that can be printed 
at the best possible line-breaking position found so far, or 0 if no 
suitable break has been found.".  Or is that pre_len_to_print?

> +  /* The number of bytes that MSGSTR left to convert to wide character. */

Conversion to wide characters is part of the implementation of this loop, 
not its interface.  "The number of bytes of MSGSTR not yet examined for 
possible line breaks."?

> +  /* Convert one character at a time, keep track of the width at each point 
> +     and of the byte count to the last possible point for a line break. The 
> +     best candidate point found so far to break MSGSTR is after the space,
> +     some puncts or multibyte character with the number of bytes
> +     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere

Do you mean "is after CUR_LEN_TO_PRINT bytes of the string"?  Saying "with 
the number of bytes" sounds like counting the bytes in a single character, 
rather than being a position within the string.

> +     PRE_LEN_TO_PRINT to break the string, that is the best point available;

Previously you said CUR_LEN_TO_PRINT is the best point so far.  Now you're 
saying PRE_LEN_TO_PRINT is best.  Which is?

> +     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
> +     is a valid point to break. */ 

Why wouldn't it be valid, if it was previously the best valid point?

Perhaps you need to distinguish clearly whether a variable's value 
represents what is known to be a valid point for breaking the string, or 
might be one subject to further tests.  "cur_" and "pre_" do not seem 
adequate to represent that distinction.  The variable for a value still 
subject to further tests (if needed at all) should be local to a single 
iteration of the loop.

> +  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
> +    {
> +      if (nbytes >= (size_t) -2)
> +	/* Invalide input string. */

"Invalid".

> +      /* Compute the point to break the string. */
> +      if (left_len == 0)
> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining;

I think it might be clearer to have the loop in the form

  while (left_len > 0)

which at least would avoid questions of what mbrtowc is meant to do when 
given a length of 0.  After the loop, if cur_len_to_print is 0 or the 
entire string would fit in the given number of columns, set 
cur_len_to_print accordingly to indicate that the whole string should be 
printed.  Then make the loop mimic that in the non-i18n version of the 
function as much as possible: if the previous columns occupy at least the 
full width and a point to break has been found, then break; otherwise 
decode a character; if a space then breaking *before* the space is OK; if 
'-' or '/' (and the other conditions apply) then breaking *after* that 
character is OK.

Incidentally, it would be a lot clearer if you had a single variable for 
the byte index into the string (like the non-i18n version) instead of both 
increasing msgstr and decreasing left_len - so the loop would be "while (i 
< remaining)".  You wouldn't need to do "remaining - left_len" 
computations that way.

> +	}
> +      else if (iswspace (wc[0]))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
> +      else  if ((wc[0] == L'-' || wc[0] == L'/')
> +	        && iswalpha (pre_wc))
> +	pre_len_to_print = cur_len_to_print = remaining - left_len;

Will pre_wc always be initialized here, if the string starts with '-' or 
'/'?

> +      else if (nbytes > 1)

I still see no reason you should need conditionals on the number of bytes 
in a character, instead of always working with characters regardless of 
the number of bytes in them.

> +	{
> +	  pre_len_to_print = cur_len_to_print;
> +	  cur_len_to_print = remaining - left_len;
> +	  pre_wc = wc[0];
> +	  pre_nbytes = nbytes;
> +	  /* Keep the puncts printed with the previous word. */

I don't understand what issue you are dealing with here, but punctuation 
other than '/' or '-' isn't considered a valid line-breaking point anyway 
so you shouldn't need special code for it.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-04-06 16:56                               ` Joseph S. Myers
@ 2010-04-13 11:32                                 ` Shujing Zhao
  2010-04-15 21:54                                   ` Joseph S. Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Shujing Zhao @ 2010-04-13 11:32 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini

[-- Attachment #1: Type: text/plain, Size: 4393 bytes --]

On 04/07/2010 12:55 AM, Joseph S. Myers wrote:
Thanks. Fix the typo and all the comments. Change the confused variable 
cur_len_to_print to len_to_print.
>> +      /* Compute the point to break the string. */
>> +      if (left_len == 0)
>> +	{
>> +	  pre_len_to_print = cur_len_to_print;
>> +	  cur_len_to_print = remaining;
> 
> I think it might be clearer to have the loop in the form
> 
>   while (left_len > 0)
> 
> which at least would avoid questions of what mbrtowc is meant to do when 
> given a length of 0.  After the loop, if cur_len_to_print is 0 or the 
> entire string would fit in the given number of columns, set 
> cur_len_to_print accordingly to indicate that the whole string should be 
> printed.  Then make the loop mimic that in the non-i18n version of the 
> function as much as possible: if the previous columns occupy at least the 
> full width and a point to break has been found, then break; otherwise 
> decode a character; if a space then breaking *before* the space is OK; if 
> '-' or '/' (and the other conditions apply) then breaking *after* that 
> character is OK.
> 
> Incidentally, it would be a lot clearer if you had a single variable for 
> the byte index into the string (like the non-i18n version) instead of both 
> increasing msgstr and decreasing left_len - so the loop would be "while (i 
> < remaining)".  You wouldn't need to do "remaining - left_len" 
> computations that way.
Both pre_len_to_print and len_to_print are initialized REMAINING now. If no 
suitable line-breaking position has been found, the whole line will be printed.
a variable i is added to index the string to decode and use

while (i < remaining)

to control the loop. It looks clear. Thanks.
> 
>> +	}
>> +      else if (iswspace (wc[0]))
>> +	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
>> +      else  if ((wc[0] == L'-' || wc[0] == L'/')
>> +	        && iswalpha (pre_wc))
>> +	pre_len_to_print = cur_len_to_print = remaining - left_len;
> 
> Will pre_wc always be initialized here, if the string starts with '-' or 
> '/'?
It is changed to
+      else if ((wc[0] == L'-' || wc[0] == L'/')
+              && i > nbytes
+              && iswalpha (pre_wc))
(i > nbytes) would ensure the pre_wc is initialized. I found this solution from 
the non-i18n version.
> 
>> +      else if (nbytes > 1)
> 
> I still see no reason you should need conditionals on the number of bytes 
> in a character, instead of always working with characters regardless of 
> the number of bytes in them.
This condition is to make the string can be break after a multi bytes wide 
alphabetic or punctuations. The line can be break after every multi bytes wide 
alphabetic or punctuations. I think the nbytes can distinguish if it is a wide 
character before decode. The letter 'a' is recognized wide character after the 
decoding, but the line can't be broken after a 'a' at the string "after". I 
think the difference between one byte wide character and the multi bytes wide 
character is the nbytes.

Is it better changed to

+      else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)

better?
> 
>> +	{
>> +	  pre_len_to_print = cur_len_to_print;
>> +	  cur_len_to_print = remaining - left_len;
>> +	  pre_wc = wc[0];
>> +	  pre_nbytes = nbytes;
>> +	  /* Keep the puncts printed with the previous word. */
> 
> I don't understand what issue you are dealing with here, but punctuation 
> other than '/' or '-' isn't considered a valid line-breaking point anyway 
> so you shouldn't need special code for it.
> 
Since the line can be broken after a multi bytes alphabetic, the following 
punctuation maybe be printed as the first character at the new line if the 
punctuation didn't be dealing with. I also found at some special cases, only a 
"full stop" printed at a new line. It looks strange. So I design to print these 
punctuation with the previous alphabetic.
That is to say, if a wide character followed by a wide character punctuation, 
the possible line-breaking position would not be set after this wide character. 
It would be set after the punctuation.
yeah, the '/' and '-' must be considered. How about used (nbytes > 1) again to 
distinguish the multi bytes wide character punctuation? It at least ensure the 
Chinese multi bytes punctuations are handled. The other punctuation will be 
handle just like the non-i18n version.

Thanks
Pearly

[-- Attachment #2: 0413.patch --]
[-- Type: text/x-patch, Size: 8546 bytes --]

Index: intl.h
===================================================================
--- intl.h	(revision 158214)
+++ intl.h	(working copy)
@@ -61,6 +61,8 @@ extern const char *fake_ngettext(const c
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 158214)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@ gcc_init_libintl (void)
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,103 @@ gcc_gettext_width (const char *msgstr)
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Look for a suitable point at which to break string MSGSTR. If there is one 
+   after at most ROOM columns, take the latest such point. Otherwise, 
+   take the first such point in the string, or the end of the string if it 
+   cannot be broken.
+   REMAINING is the number of bytes of MSGSTR.
+   Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1]; 
+  wchar_t pre_wc;
+
+  /* The byte index of the string when convert to a wide character. */
+  unsigned int i = 0;
+  
+  /* The number of column positions for the printed wide-character string. */
+  unsigned int colwidth = 0;
+
+  /* The number of bytes that may be printed at the possible line-breaking
+     position found so far, or REMAINING if no suitable break has been found. */
+  unsigned int len_to_print = remaining;
+
+  /* The previous number of bytes that can be printed at the possible
+     line-breaking position found so far, or REMAINING if no suitable
+     break has been found. */
+  unsigned int pre_len_to_print = remaining;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+
+  /* Convert one character at a time, keep track of the width at each point 
+     and of the byte count to the last possible point for a line break. The 
+     best candidate point found so far to break MSGSTR is after the space,
+     '-', '/', the multi bytes puncts or the multi bytes alphabetic.
+     If we are outside ROOM columns after LEN_TO_PRINT and have found
+     somewhere PRE_LEN_TO_PRINT to break the string, PRE_LEN_TO_PRINT is the
+     best point available; otherwise, the string will be break after
+     LEN_TO_PRINT. */ 
+  do
+    {
+      nbytes = mbrtowc (wc, msgstr, remaining - i, &state);
+      if (nbytes >= (size_t) -2)
+	/* Invalid input string. */
+	break;
+
+      i += nbytes;
+      colwidth += wcwidth (wc[0]);
+
+      /* Compute the point to break the string. */
+      if (i == remaining)
+	{
+	  pre_len_to_print = len_to_print;
+	  len_to_print = remaining;
+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = len_to_print = i - nbytes;
+      else if ((wc[0] == L'-' || wc[0] == L'/')
+	       && i > nbytes
+	       && iswalpha (pre_wc))
+	pre_len_to_print = len_to_print = i;
+      else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)
+	{
+	  size_t nbytes_next;
+	  pre_len_to_print = len_to_print;
+	  len_to_print = i;
+	  pre_wc = wc[0];
+	  nbytes_next = mbrtowc (wc, msgstr + nbytes, remaining - i, &state);
+	  if (nbytes_next >= (size_t) -2)
+	    break;
+	  /* Keep the multi bytes puncts printed with the previous word. */
+	  if (iswpunct (wc[0]) && iswalpha (pre_wc) && nbytes_next > 1)
+	    {
+	      i += nbytes_next;
+	      colwidth += wcwidth (wc[0]);
+	      len_to_print = i;
+	      msgstr += nbytes_next;
+	    }
+	}
+
+      if (colwidth >= room)
+	{
+	  /* If the string is outside ROOM columns after the possible 
+	     LEN_TO_PRINT, the break point will be set to the previous point. */
+	  if (colwidth > room)
+	    len_to_print = pre_len_to_print;
+	  break;
+	}
+
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  while (i < remaining);
+  return len_to_print;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,9 +217,7 @@ gcc_gettext_width (const char *msgstr)
 
 #endif
 
-#endif /* ENABLE_NLS */
-
-#ifndef ENABLE_NLS
+#else /* Not defined ENABLE_NLS */
 
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
@@ -132,6 +228,35 @@ fake_ngettext (const char *singular, con
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 158214)
+++ opts.c	(working copy)
@@ -1143,45 +1143,45 @@ decode_options (unsigned int argc, const
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column and HELP in the second column with the total
+   COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+   corresponding number of spaces for alignment. If the ROOM is less than
+   REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+   take the end of the string.
+   If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+   the left column from the secode line. */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
-  unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  unsigned int col_width = LEFT_COLUMN;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
-
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
+	len = get_aligned_len (help, room, remaining);
 
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1220,7 +1220,7 @@ print_filtered_help (unsigned int includ
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1236,6 +1236,7 @@ print_filtered_help (unsigned int includ
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1272,7 +1273,10 @@ print_filtered_help (unsigned int includ
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1313,7 +1317,7 @@ print_filtered_help (unsigned int includ
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH PR/42686] Align the help text output
  2010-04-13 11:32                                 ` Shujing Zhao
@ 2010-04-15 21:54                                   ` Joseph S. Myers
  0 siblings, 0 replies; 20+ messages in thread
From: Joseph S. Myers @ 2010-04-15 21:54 UTC (permalink / raw)
  To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini

On Tue, 13 Apr 2010, Shujing Zhao wrote:

> > > +      else if (nbytes > 1)
> > 
> > I still see no reason you should need conditionals on the number of bytes in
> > a character, instead of always working with characters regardless of the
> > number of bytes in them.
> This condition is to make the string can be break after a multi bytes wide
> alphabetic or punctuations. The line can be break after every multi bytes wide
> alphabetic or punctuations. I think the nbytes can distinguish if it is a wide
> character before decode. The letter 'a' is recognized wide character after the
> decoding, but the line can't be broken after a 'a' at the string "after". I
> think the difference between one byte wide character and the multi bytes wide
> character is the nbytes.
> 
> Is it better changed to
> 
> +      else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1)
> 
> better?

I still cannot make sense of what the logical condition is that this code 
is trying to implement; that is, the logical properties of a pair of 
characters and what the conclusion is from those properties about whether 
a break is or is not permitted at a particular location in relation to 
those characters.  You have a series of conditions that might be described 
something like (and it's possible I'm not understanding your intent):

/* We can break at the end of the string if it is narrow enough.  */
/* If there is a space, we can break before the space (and not print the 
   space).  */
/* Break after '-' or '/' if the previous character was alphabetical, but 
   not in the middle of "--" (for example).  */

But what in plain English is the rule this code is trying to implement for 
a condition on breaking or not breaking the string?

Whatever the condition is, it relates to some logical properties of the 
characters in question.  The number of bytes is not a logical property; 
it's a physical property of the particular locale character set and may 
vary from character set to character set (gettext will automatically 
translate using iconv to the LC_CTYPE character set).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-04-15 21:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-25 10:32 [PATCH PR/42686] Align the help text output Shujing Zhao
2010-02-25 16:34 ` Joseph S. Myers
2010-03-10 10:26   ` Shujing Zhao
2010-03-10 17:19     ` Joseph S. Myers
2010-03-12 11:34       ` Shujing Zhao
2010-03-12 17:09         ` Joseph S. Myers
2010-03-15  9:28           ` Shujing Zhao
2010-03-15 10:58             ` Shujing Zhao
2010-03-15 11:42             ` Joseph S. Myers
2010-03-17  7:57               ` Shujing Zhao
2010-03-17 12:50                 ` Joseph S. Myers
2010-03-18 11:48                   ` Shujing Zhao
2010-03-18 17:54                     ` Joseph S. Myers
2010-03-19 11:43                       ` Shujing Zhao
2010-03-19 13:44                         ` Joseph S. Myers
2010-03-22  9:17                           ` Shujing Zhao
2010-03-31  7:03                             ` Shujing Zhao
2010-04-06 16:56                               ` Joseph S. Myers
2010-04-13 11:32                                 ` Shujing Zhao
2010-04-15 21:54                                   ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).