From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2417 invoked by alias); 22 Mar 2010 08:43:05 -0000 Received: (qmail 2405 invoked by uid 22791); 22 Mar 2010 08:43:03 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from rcsinet11.oracle.com (HELO rcsinet11.oracle.com) (148.87.113.123) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Mar 2010 08:42:58 +0000 Received: from rcsinet13.oracle.com (rcsinet13.oracle.com [148.87.113.125]) by rcsinet11.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id o2M8gTiF031851 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 22 Mar 2010 08:42:53 GMT Received: from acsmt353.oracle.com (acsmt353.oracle.com [141.146.40.153]) by rcsinet13.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o2M8PVH0020828; Mon, 22 Mar 2010 08:42:28 GMT Received: from abhmt015.oracle.com by acsmt355.oracle.com with ESMTP id 99537491269247342; Mon, 22 Mar 2010 01:42:22 -0700 Received: from [10.182.121.28] (/10.182.121.28) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 22 Mar 2010 01:42:21 -0700 Message-ID: <4BA72D29.2080406@oracle.com> Date: Mon, 22 Mar 2010 09:17:00 -0000 From: Shujing Zhao User-Agent: Thunderbird 2.0.0.24 (X11/20100228) MIME-Version: 1.0 To: "Joseph S. Myers" CC: GCC Patches , Paolo Carlini Subject: Re: [PATCH PR/42686] Align the help text output References: <4B863559.4000407@oracle.com> <4B976D51.8010705@oracle.com> <4B9A1E65.4080506@oracle.com> <4B9DFA37.9010303@oracle.com> <4BA07D89.3040406@oracle.com> <4BA2010C.9030103@oracle.com> <4BA35936.8070304@oracle.com> In-Reply-To: Content-Type: multipart/mixed; boundary="------------030907050902060103060008" X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-03/txt/msg00961.txt.bz2 This is a multi-part message in MIME format. --------------030907050902060103060008 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Content-length: 4299 On 03/19/2010 09:26 PM, Joseph S. Myers wrote: > On Fri, 19 Mar 2010, Shujing Zhao wrote: > >>> Why this increment? Characters may have width 0 (combining characters) or >>> more than 1 and it's not clear why adding 1 should be the correct thing to >>> do here. >> It is changed to *length--* at the new logic. >> + colwidth += wcwidth (wc[0]); >> + length--; >> + if (length == 0) >> + len = remaining; >> I have tried to used length = length - nbytes to get the new length. But if >> nbytes > 1 and the character is the last character of the string (length will >> be 0 if it decreases nbytes), it would not assign the last variable *len* to >> *len_bk*, then the character that maybe put into the ROOM would not be >> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it >> would be handled specially (just to update the value of len_bk). At the other >> conditionals, len_bk is always kept consistently with len. > > I'm afraid none of this makes any sense to me, because I have no idea what > "length", "len" and "len_bk" are meant to be, and how they differ. You > need clearer variable names. I think you also need comments at various > points stating what the values various variables mean at those points, and > maybe an overview comment about the algorithm used. It would be best to > avoid ambiguous terms such as "length" at all points in variable names and > comments, instead talking about numbers of bytes difference between two > points, or similar. > > For example, the present algorithm in opts.c:wrap_help might be described > thus: > > /* Look for a suitable point at which to break HELP. If there is one > after at most ROOM columns, take the latest such point. Otherwise, > take the first such point in the string, or the end of the string if it > cannot be broken. */ > > If the new algorithm is still like that, you may just need to update > variable names (and explain what the interface is for indicating both how > much of the string to print, and how much (spaces) to skip before starting > on the next line). > > The key variables to explain are LEN and I. At each iteration in the > present loop, something like this might explain them. > > /* The best candidate point found so far to break HELP is after LEN > bytes, where either LEN < I, or LEN == REMAINING if the string cannot > be broken within the first I bytes. If we are outside ROOM columns and > have found somewhere to break the string, that is the best point > available; otherwise, breaking after I bytes will be better if that is > a valid point to break. */ > > Now, things will be a bit more complicated in that you need to track the > current column width as well as the current number of bytes of the string > that take up that width. But I think the basic idea is still the same - > it is just that "length", "len" and "len_bk" are not useful variable names > to distinguish the different quantities being tracked and so make clear > whether particular arithmetic you are doing on them makes logical sense. Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of bytes that have been found can be printed at ROOM columns, "len" to "cur_len_to_print" as the possible number of bytes that can be printed at ROOM columns, "length" to left_len as the number of bytes that MSGSTR left to convert to wide character. Some comments are also added as your suggestion. > I remain very suspicious of anything modifying a byte counter by 1 in this > context; if you are ever modifying a byte counter by something other than > the number of bytes in the relevant multibyte character I think you are > confused and need to sort out the logic of the code so you don't need to > do so. Thanks. You are right. I found the way to solve the problem that I have mentioned when I explain why I have to use "length--". It just needs to store the previous point that can be break the string before the new point is set to the end of the string. + left_len -= nbytes; + + /* Compute the point to break the string. */ + if (left_len == 0) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining; + } Is it Ok? Thanks Pearly --------------030907050902060103060008 Content-Type: text/x-patch; name="0322.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="0322.patch" Content-length: 7982 Index: intl.h =================================================================== --- intl.h (revision 157508) +++ intl.h (working copy) @@ -60,6 +60,8 @@ #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157508) +++ intl.c (working copy) @@ -92,6 +92,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include +#include /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,99 @@ return wcswidth (wmsgstr, nwcs); } +/* Look for a suitable point at which to break string MSGSTR. If there is one + after at most ROOM columns, take the latest such point. Otherwise, + take the first such point in the string, or the end of the string if it + cannot be broken. + REMAINING is the number of bytes of MSGSTR. + Return the number of bytes before the point. */ +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + size_t nbytes; + wchar_t wc[1]; + size_t pre_nbytes; + wchar_t pre_wc; + + unsigned int colwidth = 0; + + /* The number of bytes that have been found can be printed at ROOM columns. */ + unsigned int pre_len_to_print; + + /* The possible number of bytes that can be printed at ROOM columns. */ + unsigned int cur_len_to_print = 0; + + /* The number of bytes that MSGSTR left to convert to wide character. */ + int left_len = remaining; + + mbstate_t state; + memset (&state, '\0', sizeof (state)); + + /* Convert one character at a time, keep track of the width at each point + and of the byte count to the last possible point for a line break. The + best candidate point found so far to break MSGSTR is after the space, + some puncts or multibyte character with the number of bytes + CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere + PRE_LEN_TO_PRINT to break the string, that is the best point available; + otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that + is a valid point to break. */ + while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + /* Invalide input string. */ + break; + + colwidth += wcwidth (wc[0]); + left_len -= nbytes; + + /* Compute the point to break the string. */ + if (left_len == 0) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining; + } + else if (iswspace (wc[0])) + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes; + else if ((wc[0] == L'-' || wc[0] == L'/') + && iswalpha (pre_wc)) + pre_len_to_print = cur_len_to_print = remaining - left_len; + else if (nbytes > 1) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining - left_len; + pre_wc = wc[0]; + pre_nbytes = nbytes; + /* Keep the puncts printed with the previous word. */ + if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + break; + if (iswpunct (wc[0]) && iswalpha (pre_wc)) + { + cur_len_to_print = remaining - left_len + nbytes; + colwidth += wcwidth (wc[0]); + msgstr += nbytes; + left_len -= nbytes; + } + nbytes = pre_nbytes; + } + } + + if (colwidth >= room) + { + /* If the string is outside ROOM columns after counted the last + word, the break point will be set to the previous point it kept. */ + if (colwidth > room) + cur_len_to_print = pre_len_to_print; + break; + } + msgstr += nbytes; + pre_wc = wc[0]; + } + return cur_len_to_print; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,10 +213,8 @@ #endif -#endif /* ENABLE_NLS */ +#else /* Not defined ENABLE_NLS */ -#ifndef ENABLE_NLS - const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) { @@ -132,6 +224,35 @@ return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 157508) +++ opts.c (working copy) @@ -1149,45 +1149,45 @@ #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, - followed by word-wrapped HELP in a second column. */ +/* Output ITEM in the left column and HELP in the second column with the total + COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the + corresponding number of spaces for alignment. If the ROOM is less than + REMAINING, look for a suitable point at which to break HELP, Otherwise, take + take the end of the string. + If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in + the left column from the secode line. */ static void wrap_help (const char *help, const char *item, - unsigned int item_width, unsigned int columns) { + unsigned int remaining, room, len; unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int item_width = gcc_gettext_width (item); remaining = strlen (help); do { + int n_spaces = col_width - item_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; + len = get_aligned_len (help, room, remaining); - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } - - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1226,7 +1226,7 @@ /* Get the translation. */ help = _(help); - wrap_help (help, param, strlen (param), columns); + wrap_help (help, param, columns); } putchar ('\n'); return; @@ -1242,6 +1242,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1279,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else @@ -1319,7 +1323,7 @@ help = new_help; } - wrap_help (help, opt, len, columns); + wrap_help (help, opt, columns); displayed = true; } --------------030907050902060103060008--