From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-258792-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2417 invoked by alias); 22 Mar 2010 08:43:05 -0000
Received: (qmail 2405 invoked by uid 22791); 22 Mar 2010 08:43:03 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from rcsinet11.oracle.com (HELO rcsinet11.oracle.com) (148.87.113.123)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Mar 2010 08:42:58 +0000
Received: from rcsinet13.oracle.com (rcsinet13.oracle.com [148.87.113.125]) 	by rcsinet11.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id o2M8gTiF031851 	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); 	Mon, 22 Mar 2010 08:42:53 GMT
Received: from acsmt353.oracle.com (acsmt353.oracle.com [141.146.40.153]) 	by rcsinet13.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o2M8PVH0020828; 	Mon, 22 Mar 2010 08:42:28 GMT
Received: from abhmt015.oracle.com by acsmt355.oracle.com 	with ESMTP id 99537491269247342; Mon, 22 Mar 2010 01:42:22 -0700
Received: from [10.182.121.28] (/10.182.121.28) 	by default (Oracle Beehive Gateway v4.0) 	with ESMTP ; Mon, 22 Mar 2010 01:42:21 -0700
Message-ID: <4BA72D29.2080406@oracle.com>
Date: Mon, 22 Mar 2010 09:17:00 -0000
From: Shujing Zhao <pearly.zhao@oracle.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100228)
MIME-Version: 1.0
To: "Joseph S. Myers" <joseph@codesourcery.com>
CC: GCC Patches <gcc-patches@gcc.gnu.org>,         Paolo Carlini <paolo.carlini@oracle.com>
Subject: Re: [PATCH PR/42686] Align the help text output
References: <4B863559.4000407@oracle.com> <Pine.LNX.4.64.1002251547410.30254@digraph.polyomino.org.uk> <4B976D51.8010705@oracle.com> <Pine.LNX.4.64.1003101656110.12539@digraph.polyomino.org.uk> <4B9A1E65.4080506@oracle.com> <Pine.LNX.4.64.1003121651160.31790@digraph.polyomino.org.uk> <4B9DFA37.9010303@oracle.com> <Pine.LNX.4.64.1003151052520.31811@digraph.polyomino.org.uk> <4BA07D89.3040406@oracle.com> <Pine.LNX.4.64.1003171201450.2148@digraph.polyomino.org.uk> <4BA2010C.9030103@oracle.com> <Pine.LNX.4.64.1003181650570.19807@digraph.polyomino.org.uk> <4BA35936.8070304@oracle.com> <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk>
In-Reply-To: <Pine.LNX.4.64.1003191301180.30074@digraph.polyomino.org.uk>
Content-Type: multipart/mixed;  boundary="------------030907050902060103060008"
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-03/txt/msg00961.txt.bz2

This is a multi-part message in MIME format.
--------------030907050902060103060008
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Content-length: 4299

On 03/19/2010 09:26 PM, Joseph S. Myers wrote:
> On Fri, 19 Mar 2010, Shujing Zhao wrote:
> 
>>> Why this increment?  Characters may have width 0 (combining characters) or
>>> more than 1 and it's not clear why adding 1 should be the correct thing to
>>> do here.
>> It is changed to *length--* at the new logic.
>> +      colwidth += wcwidth (wc[0]);
>> +      length--;
>> +      if (length == 0)
>> +       len = remaining;
>> I have tried to used length = length - nbytes to get the new length. But if
>> nbytes > 1 and the character is the last character of the string (length will
>> be 0 if it decreases nbytes), it would not assign the last variable *len* to
>> *len_bk*, then the character that maybe put into the ROOM would not be
>> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it
>> would be handled specially (just to update the value of len_bk). At the other
>> conditionals, len_bk is always kept consistently with len.
> 
> I'm afraid none of this makes any sense to me, because I have no idea what 
> "length", "len" and "len_bk" are meant to be, and how they differ.  You 
> need clearer variable names.  I think you also need comments at various 
> points stating what the values various variables mean at those points, and 
> maybe an overview comment about the algorithm used.  It would be best to 
> avoid ambiguous terms such as "length" at all points in variable names and 
> comments, instead talking about numbers of bytes difference between two 
> points, or similar.
> 
> For example, the present algorithm in opts.c:wrap_help might be described 
> thus:
> 
> /* Look for a suitable point at which to break HELP.  If there is one 
>    after at most ROOM columns, take the latest such point.  Otherwise, 
>    take the first such point in the string, or the end of the string if it 
>    cannot be broken.  */
> 
> If the new algorithm is still like that, you may just need to update 
> variable names (and explain what the interface is for indicating both how 
> much of the string to print, and how much (spaces) to skip before starting 
> on the next line).
> 
> The key variables to explain are LEN and I.  At each iteration in the 
> present loop, something like this might explain them.
> 
> /* The best candidate point found so far to break HELP is after LEN 
>    bytes, where either LEN < I, or LEN == REMAINING if the string cannot 
>    be broken within the first I bytes.  If we are outside ROOM columns and 
>    have found somewhere to break the string, that is the best point 
>    available; otherwise, breaking after I bytes will be better if that is 
>    a valid point to break.  */
> 
> Now, things will be a bit more complicated in that you need to track the 
> current column width as well as the current number of bytes of the string 
> that take up that width.  But I think the basic idea is still the same - 
> it is just that "length", "len" and "len_bk" are not useful variable names 
> to distinguish the different quantities being tracked and so make clear 
> whether particular arithmetic you are doing on them makes logical sense.  
Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of bytes that 
have been found can be printed at ROOM columns, "len" to "cur_len_to_print" as 
the possible number of bytes that can be printed at ROOM columns, "length" to 
left_len as the number of bytes that MSGSTR left to convert to wide character.
Some comments are also added as your suggestion.

> I remain very suspicious of anything modifying a byte counter by 1 in this 
> context; if you are ever modifying a byte counter by something other than 
> the number of bytes in the relevant multibyte character I think you are 
> confused and need to sort out the logic of the code so you don't need to 
> do so.
Thanks. You are right. I found the way to solve the problem that I have 
mentioned when I explain why I have to use "length--". It just needs to store 
the previous point that can be break the string before the new point is set to 
the end of the string.

+      left_len -= nbytes;
+
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+       {
+         pre_len_to_print = cur_len_to_print;
+         cur_len_to_print = remaining;
+       }

Is it Ok?

Thanks
Pearly

--------------030907050902060103060008
Content-Type: text/x-patch;
 name="0322.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="0322.patch"
Content-length: 7982

Index: intl.h
===================================================================
--- intl.h	(revision 157508)
+++ intl.h	(working copy)
@@ -60,6 +60,8 @@
 #endif
 
 extern char *get_spaces (const char *);
+extern unsigned int get_aligned_len (const char *, unsigned int,
+				     unsigned int);
 
 extern const char *open_quote;
 extern const char *close_quote;
Index: intl.c
===================================================================
--- intl.c	(revision 157508)
+++ intl.c	(working copy)
@@ -92,6 +92,7 @@
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
 #include <wchar.h>
+#include <wctype.h>
 
 /* Returns the width in columns of MSGSTR, which came from gettext.
    This is for indenting subsequent output.  */
@@ -106,6 +107,99 @@
   return wcswidth (wmsgstr, nwcs);
 }
 
+/* Look for a suitable point at which to break string MSGSTR. If there is one 
+   after at most ROOM columns, take the latest such point. Otherwise, 
+   take the first such point in the string, or the end of the string if it 
+   cannot be broken.
+   REMAINING is the number of bytes of MSGSTR.
+   Return the number of bytes before the point. */
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  size_t nbytes;
+  wchar_t wc[1]; 
+  size_t pre_nbytes;
+  wchar_t pre_wc;
+
+  unsigned int colwidth = 0;
+
+  /* The number of bytes that have been found can be printed at ROOM columns. */
+  unsigned int pre_len_to_print;
+
+  /* The possible number of bytes that can be printed at ROOM columns. */
+  unsigned int cur_len_to_print = 0;
+
+  /* The number of bytes that MSGSTR left to convert to wide character. */
+  int left_len = remaining;
+
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+
+  /* Convert one character at a time, keep track of the width at each point 
+     and of the byte count to the last possible point for a line break. The 
+     best candidate point found so far to break MSGSTR is after the space,
+     some puncts or multibyte character with the number of bytes
+     CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere
+     PRE_LEN_TO_PRINT to break the string, that is the best point available;
+     otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that
+     is a valid point to break. */ 
+  while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0)
+    {
+      if (nbytes >= (size_t) -2)
+	/* Invalide input string. */
+	break;
+      
+      colwidth += wcwidth (wc[0]);
+      left_len -= nbytes;
+
+      /* Compute the point to break the string. */
+      if (left_len == 0)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining;
+	}
+      else if (iswspace (wc[0]))
+	pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes;
+      else  if ((wc[0] == L'-' || wc[0] == L'/')
+	        && iswalpha (pre_wc))
+	pre_len_to_print = cur_len_to_print = remaining - left_len;
+      else if (nbytes > 1)
+	{
+	  pre_len_to_print = cur_len_to_print;
+	  cur_len_to_print = remaining - left_len;
+	  pre_wc = wc[0];
+	  pre_nbytes = nbytes;
+	  /* Keep the puncts printed with the previous word. */
+	  if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0)
+	    {
+	      if (nbytes >= (size_t) -2)
+		break;
+	      if (iswpunct (wc[0]) && iswalpha (pre_wc))
+		{
+		  cur_len_to_print = remaining - left_len + nbytes;
+		  colwidth += wcwidth (wc[0]);
+		  msgstr += nbytes;
+		  left_len -= nbytes;
+		}
+	      nbytes = pre_nbytes;
+	    }
+	}
+
+      if (colwidth >= room)
+	{
+	  /* If the string is outside ROOM columns after counted the last
+	     word, the break point will be set to the previous point it kept. */
+	  if (colwidth > room)
+	    cur_len_to_print = pre_len_to_print;
+	  break;
+	}
+      msgstr += nbytes;
+      pre_wc = wc[0];
+    }
+  return cur_len_to_print;
+}
+
 #else  /* no wcswidth */
 
 /* We don't have any way of knowing how wide the string is.  Guess
@@ -119,10 +213,8 @@
 
 #endif
 
-#endif /* ENABLE_NLS */
+#else /* Not defined ENABLE_NLS */
 
-#ifndef ENABLE_NLS
-
 const char *
 fake_ngettext (const char *singular, const char *plural, unsigned long n)
 {
@@ -132,6 +224,35 @@
   return plural;
 }
 
+#endif /* ENABLE_NLS */
+
+#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H             \
+			       && defined HAVE_WORKING_MBSTOWCS \
+			       && defined HAVE_WCSWIDTH)
+
+/* Return the length of MSGSTR that can be printed within the width ROOM. 
+   REMAINING is the length of string MSGSTR counted by bytes. */
+
+unsigned int
+get_aligned_len (const char *msgstr, unsigned int room,
+		 unsigned int remaining)
+{
+  unsigned int i;
+  unsigned int len = remaining;
+  for (i = 0; msgstr[i]; i++)
+    {
+      if (i >= room && len != remaining)
+	break;
+      if (msgstr[i] == ' ')
+	len = i;
+      else if ((msgstr[i] == '-' || msgstr[i] == '/')
+	       && msgstr[i + 1] != ' '
+	       && i > 0 && ISALPHA (msgstr[i - 1]))
+	len = i + 1;
+    }
+  return len;
+}
+
 #endif
 
 /* Return the indent for successive lines, using the width of
Index: opts.c
===================================================================
--- opts.c	(revision 157508)
+++ opts.c	(working copy)
@@ -1149,45 +1149,45 @@
 
 #define LEFT_COLUMN	27
 
-/* Output ITEM, of length ITEM_WIDTH, in the left column,
-   followed by word-wrapped HELP in a second column.  */
+/* Output ITEM in the left column and HELP in the second column with the total
+   COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the
+   corresponding number of spaces for alignment. If the ROOM is less than
+   REMAINING, look for a suitable point at which to break HELP, Otherwise, take
+   take the end of the string.
+   If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in
+   the left column from the secode line. */
 static void
 wrap_help (const char *help,
 	   const char *item,
-	   unsigned int item_width,
 	   unsigned int columns)
 {
+  unsigned int remaining, room, len;
   unsigned int col_width = LEFT_COLUMN;
-  unsigned int remaining, room, len;
+  unsigned int item_width = gcc_gettext_width (item);
 
   remaining = strlen (help);
 
   do
     {
+      int n_spaces = col_width - item_width;
+      printf ("  %s", item);
+      if (n_spaces > 0)
+	{
+	  const char *space = " ";
+	  printf ("%*s", n_spaces, space);
+	} 
+
       room = columns - 3 - MAX (col_width, item_width);
       if (room > columns)
 	room = 0;
       len = remaining;
 
       if (room < len)
-	{
-	  unsigned int i;
+	len = get_aligned_len (help, room, remaining);
 
-	  for (i = 0; help[i]; i++)
-	    {
-	      if (i >= room && len != remaining)
-		break;
-	      if (help[i] == ' ')
-		len = i;
-	      else if ((help[i] == '-' || help[i] == '/')
-		       && help[i + 1] != ' '
-		       && i > 0 && ISALPHA (help[i - 1]))
-		len = i + 1;
-	    }
-	}
-
-      printf( "  %-*.*s %.*s\n", col_width, item_width, item, len, help);
-      item_width = 0;
+      printf ( " %.*s\n", len, help); 
+      item = " ";
+      item_width = 1;
       while (help[len] == ' ')
 	len++;
       help += len;
@@ -1226,7 +1226,7 @@
 	  /* Get the translation.  */
 	  help = _(help);
 
-	  wrap_help (help, param, strlen (param), columns);
+	  wrap_help (help, param, columns);
 	}
       putchar ('\n');
       return;
@@ -1242,6 +1242,7 @@
       unsigned int len;
       const char *opt;
       const char *tab;
+      char *buf;
 
       if (include_flags == 0
 	  || ((option->flags & include_flags) != include_flags))
@@ -1278,7 +1279,10 @@
       if (tab)
 	{
 	  len = tab - help;
-	  opt = help;
+	  buf = (char *) alloca (len + 1);
+	  strncpy (buf, help, len);
+	  buf[len] = '\0';
+	  opt = buf;
 	  help = tab + 1;
 	}
       else
@@ -1319,7 +1323,7 @@
 	  help = new_help;
 	}
 
-      wrap_help (help, opt, len, columns);
+      wrap_help (help, opt, columns);
       displayed = true;
     }
 

--------------030907050902060103060008--