* [PATCH PR/42686] Align the help text output @ 2010-02-25 10:32 Shujing Zhao 2010-02-25 16:34 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-02-25 10:32 UTC (permalink / raw) To: GCC Patches; +Cc: Joseph S. Myers, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 433 bytes --] Hi, This patch is to align the help text output when it includes the characters that use three bytes in UTF-8 in some languages, such as Chinese. The option name is left-aligned with the width 27 or the length of itself from the 3th column. The help text is left-aligned with the width 50 (default) from the 31th column. Tested on i686-pc-linux-gnu with no regression to the "C" locale environment. OK for trunk? Thanks Pearly [-- Attachment #2: ChangeLog --] [-- Type: text/plain, Size: 261 bytes --] 2010-02-25 Shujing Zhao <pearly.zhao@oracle.com> PR translation/42686 * opts.c (wrap_help): Align the help output when it includes the characters that use three bytes in UTF-8. (print_filtered_help): Split the help string to option name and help text. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: pr42686.patch --] [-- Type: text/x-patch; name="pr42686.patch", Size: 1668 bytes --] Index: opts.c =================================================================== --- opts.c ï¼ä¿®è®¢ç 157061ï¼ +++ opts.c ï¼å·¥ä½æ·è´ï¼ @@ -1158,8 +1158,9 @@ wrap_help (const char *help, unsigned int columns) { unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int remaining, room, len, width_diff; + width_diff = item_width - gcc_gettext_width(item); remaining = strlen (help); do @@ -1183,10 +1184,20 @@ wrap_help (const char *help, && help[i + 1] != ' ' && i > 0 && ISALPHA (help[i - 1])) len = i + 1; + /* For the characters use three bytes in UTF-8 in some languages, + such as Chinese, move three bytes everytime. */ + else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0) + { + i = i + 2; + len = i + 1; + room = room + 1; + } } } - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); + printf (" %-*.*s %.*s\n", + col_width + width_diff, item_width, item, len, help); + width_diff = 0; item_width = 0; while (help[len] == ' ') len++; @@ -1242,6 +1253,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1290,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *)alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-02-25 10:32 [PATCH PR/42686] Align the help text output Shujing Zhao @ 2010-02-25 16:34 ` Joseph S. Myers 2010-03-10 10:26 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-02-25 16:34 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Thu, 25 Feb 2010, Shujing Zhao wrote: > Hi, > > This patch is to align the help text output when it includes the characters > that use three bytes in UTF-8 in some languages, such as Chinese. > The option name is left-aligned with the width 27 or the length of itself from > the 3th column. The help text is left-aligned with the width 50 (default) from > the 31th column. I can't understand this description of what you intend to achieve here. > + width_diff = item_width - gcc_gettext_width(item); Missing space before "(". > @@ -1183,10 +1184,20 @@ wrap_help (const char *help, > && help[i + 1] != ' ' > && i > 0 && ISALPHA (help[i - 1])) > len = i + 1; > + /* For the characters use three bytes in UTF-8 in some languages, > + such as Chinese, move three bytes everytime. */ > + else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0) > + { > + i = i + 2; > + len = i + 1; > + room = room + 1; This does not make sense to me. Checking whether a "char" is negative is not meaningful in portable code since char may be either signed or unsigned, the locale may not use UTF-8, and some characters may have other lengths. You have a translated help text, in whatever the locale character set is. You need to find a suitable point to break a line. In so doing, you need to step by multibyte characters (using interfaces such as mbrtowc to do so - put wrappers in intl.c that fall back to single characters if --disable-nls or the relevant interfaces are not available). You can then examine the wide characters - not the multibyte sequences - to see what is a suitable point to wrap the line, and determine the actual display width of each wide character in the process. I don't know if comparing with L'-', L'/' and L' ' is really appropriate for all languages, but don't have a better option. However, the locale-independent ISALPHA, used in the present code, clearly seems inappropriate; using iswalpha (again with a fallback for --disable-nls or when this function is not present) would be better. I'd be inclined to put the entire logic for testing whether something is a suitable point for wrapping, and for determining what characters to skip after wrapping, in a function in intl.c. (The code that supports wrapping lines for diagnostics suffers the same problems with counting bytes rather than display width, and might be able to share some of the fix.) > - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); > + printf (" %-*.*s %.*s\n", > + col_width + width_diff, item_width, item, len, help); > + width_diff = 0; Please note that both width and precision in printf formats count *bytes* not *multibyte characters* or *screen columns*. So once you've corrected the logic above and know both how wide the text you want to print is in screen columns, and how many bytes it is, you need to think carefully about what the correct width and precision parameters are. > + buf = (char *)alloca (len + 1); Missing space after ")" in cast. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-02-25 16:34 ` Joseph S. Myers @ 2010-03-10 10:26 ` Shujing Zhao 2010-03-10 17:19 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-10 10:26 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 3362 bytes --] Hi Joseph, all On 02/26/2010 12:09 AM, Joseph S. Myers wrote: >> @@ -1183,10 +1184,20 @@ wrap_help (const char *help, >> && help[i + 1] != ' ' >> && i > 0 && ISALPHA (help[i - 1])) >> len = i + 1; >> + /* For the characters use three bytes in UTF-8 in some languages, >> + such as Chinese, move three bytes everytime. */ >> + else if (help[i] < 0 && help[i + 1] < 0 && help[i + 2] < 0) >> + { >> + i = i + 2; >> + len = i + 1; >> + room = room + 1; > > This does not make sense to me. Checking whether a "char" is negative is > not meaningful in portable code since char may be either signed or > unsigned, the locale may not use UTF-8, and some characters may have other > lengths. > > You have a translated help text, in whatever the locale character set is. > You need to find a suitable point to break a line. In so doing, you need > to step by multibyte characters (using interfaces such as mbrtowc to do so > - put wrappers in intl.c that fall back to single characters if > --disable-nls or the relevant interfaces are not available). You can then > examine the wide characters - not the multibyte sequences - to see what is > a suitable point to wrap the line, and determine the actual display width > of each wide character in the process. I don't know if comparing with > L'-', L'/' and L' ' is really appropriate for all languages, but don't > have a better option. However, the locale-independent ISALPHA, used in > the present code, clearly seems inappropriate; using iswalpha (again with > a fallback for --disable-nls or when this function is not present) would > be better. > > I'd be inclined to put the entire logic for testing whether something is a > suitable point for wrapping, and for determining what characters to skip > after wrapping, in a function in intl.c. (The code that supports wrapping > lines for diagnostics suffers the same problems with counting bytes rather > than display width, and might be able to share some of the fix.) > >> - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); >> + printf (" %-*.*s %.*s\n", >> + col_width + width_diff, item_width, item, len, help); >> + width_diff = 0; > > Please note that both width and precision in printf formats count *bytes* > not *multibyte characters* or *screen columns*. So once you've corrected > the logic above and know both how wide the text you want to print is in > screen columns, and how many bytes it is, you need to think carefully > about what the correct width and precision parameters are. > >> + buf = (char *)alloca (len + 1); > > Missing space after ")" in cast. > Thanks your instruction for wide character handling. The updated patch handle the help string in a separate function get_aligned_len when the number of bytes is not equal to the number of column positions for a wide-character string. The functions when --disable-nls or the relevant interfaces are not available are also defined. The questions is the define of get_col_width and get_aligned_len would never be used, because gcc_gettext_width (s) is always equal to strlen (s) when --disable-nls. Then, do you think the definitions of get_col_width and get_aligned_len for --disable-nls are necessary? Tested on i686-pc-linux-gnu. Is it ok? Thanks Pearly [-- Attachment #2: ChangeLog --] [-- Type: text/plain, Size: 423 bytes --] 2010-03-10 Shujing Zhao <pearly.zhao@oracle.com> PR translation/42686 * intl.h (get_col_width, get_aligned_len): New prototype. * intl.c (wctomb_nbytes, get_col_width): New function. (get_aligned_len): New function to handle the wide character help string. * opts.c (wrap_help): Align the help output when it includes wide characters. (print_filtered_help): Split the help string to option name and help text. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 42686.patch --] [-- Type: text/x-patch; name="42686.patch", Size: 6265 bytes --] Index: intl.h =================================================================== --- intl.h ï¼ä¿®è®¢ç 157331ï¼ +++ intl.h ï¼å·¥ä½æ·è´ï¼ @@ -30,6 +30,9 @@ #include <libintl.h> extern void gcc_init_libintl (void); extern size_t gcc_gettext_width (const char *); +extern unsigned int get_col_width (const char *, unsigned int); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); #else /* Stubs. */ # undef textdomain @@ -41,6 +44,8 @@ extern size_t gcc_gettext_width (const c # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n) # define gcc_init_libintl() /* nothing */ # define gcc_gettext_width(s) strlen(s) +# define get_col_width(s, room) room +# define get_aligned_len(s, room, remaining) remaining extern const char *fake_ngettext(const char *singular,const char *plural, unsigned long int n); Index: intl.c =================================================================== --- intl.c ï¼ä¿®è®¢ç 157331ï¼ +++ intl.c ï¼å·¥ä½æ·è´ï¼ @@ -92,6 +92,7 @@ gcc_init_libintl (void) #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,94 @@ gcc_gettext_width (const char *msgstr) return wcswidth (wmsgstr, nwcs); } +/* Return the number of bytes when converts the wide character WC + to its multibyte representation. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *)xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Expand the ROOM when the number of bytes of MSGSTR is larger than + the width in columns. */ + +unsigned int +get_col_width (const char *msgstr, unsigned int room) +{ + int i; + int screen_cols, nbytes; + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + for ( i = 0; wmsgstr [i]; i++) + { + /* The number of column position for a character. */ + screen_cols = wcwidth (wmsgstr [i]); + nbytes = wctomb_nbytes (wmsgstr [i]); + if (screen_cols != nbytes) + room = room + (nbytes - screen_cols); + } + return room; +} + +/* Return the number of bytes of MSGSTR that can be printed within ROOM. + REMAINING is the overall length of MSTSTR. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = 0; + unsigned int pos = 0; + unsigned int wc_num = 0; + int nbytes; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + + for (i = 0; i < nwcs; i++) + { + unsigned int maybe_len = len + (i - pos); + nbytes = wctomb_nbytes (wmsgstr [i]); + if (maybe_len == remaining) + len = maybe_len; + else if (maybe_len >= room) + break; + else if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + nbytes = wctomb_nbytes (wmsgstr [i]); + if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + } + else + len = len + 1; + } + pos = i; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -117,6 +206,19 @@ gcc_gettext_width (const char *msgstr) return strlen (msgstr); } +unsigned int +get_col_width (const char *msgstr, unsigned int room) +{ + return room; +} + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return remaining; +} + #endif #endif /* ENABLE_NLS */ Index: opts.c =================================================================== --- opts.c ï¼ä¿®è®¢ç 157331ï¼ +++ opts.c ï¼å·¥ä½æ·è´ï¼ @@ -1161,6 +1161,8 @@ wrap_help (const char *help, unsigned int remaining, room, len; remaining = strlen (help); + if (item_width != gcc_gettext_width (item)) + col_width = get_col_width (item, col_width); do { @@ -1173,20 +1175,27 @@ wrap_help (const char *help, { unsigned int i; - for (i = 0; help[i]; i++) + if (len != gcc_gettext_width (help)) { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; + len = get_aligned_len (help, room, remaining); + } + else + { + for (i = 0; help[i]; i++) + { + if (i >= room && len != remaining) + break; + if (help[i] == ' ') + len = i; + else if ((help[i] == '-' || help[i] == '/') + && help[i + 1] != ' ' + && i > 0 && ISALPHA (help[i - 1])) + len = i + 1; + } } } - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); + col_width = LEFT_COLUMN; item_width = 0; while (help[len] == ' ') len++; @@ -1242,6 +1251,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1288,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *)alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-10 10:26 ` Shujing Zhao @ 2010-03-10 17:19 ` Joseph S. Myers 2010-03-12 11:34 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-10 17:19 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Wed, 10 Mar 2010, Shujing Zhao wrote: > +/* Expand the ROOM when the number of bytes of MSGSTR is larger than > + the width in columns. */ > + > +unsigned int > +get_col_width (const char *msgstr, unsigned int room) The comment on this function makes no sense to me at all. What does "Expand the ROOM" mean? It sounds as if ROOM is a parameter passed by reference that might be changed by the function - but that clearly isn't the case for a parameter of type unsigned int. What are the semantics of the operand ROOM? What are the semantics of the return value? (But see below on how "the number of bytes of MSGSTR is larger than the width in columns" is itself not a sensible concept for the code to be considering in the first place.) > + if (screen_cols != nbytes) This comparison is conceptually confused. In logical terms, "columns" and "bytes" are different data types that it does not make sense to compare. It so happens that in the present C code they have the same static type so the host compiler doesn't give an error for this comparison - but it is still confused, and you need to arrange the code so it doesn't try to mix different types like this. This means having a very clear notion of whether a particular variable counts columns, bytes or characters, and of how you do explicit computations of the number of each kind of object in a particular part of a string, and no comparisons between counts of different things. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-10 17:19 ` Joseph S. Myers @ 2010-03-12 11:34 ` Shujing Zhao 2010-03-12 17:09 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-12 11:34 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 2293 bytes --] On 03/11/2010 01:05 AM, Joseph S. Myers wrote: > On Wed, 10 Mar 2010, Shujing Zhao wrote: > >> +/* Expand the ROOM when the number of bytes of MSGSTR is larger than >> + the width in columns. */ >> + >> +unsigned int >> +get_col_width (const char *msgstr, unsigned int room) > > The comment on this function makes no sense to me at all. What does > "Expand the ROOM" mean? It sounds as if ROOM is a parameter passed by > reference that might be changed by the function - but that clearly isn't > the case for a parameter of type unsigned int. What are the semantics of > the operand ROOM? What are the semantics of the return value? (But see > below on how "the number of bytes of MSGSTR is larger than the width in > columns" is itself not a sensible concept for the code to be considering > in the first place.) > >> + if (screen_cols != nbytes) > > This comparison is conceptually confused. In logical terms, "columns" and > "bytes" are different data types that it does not make sense to compare. > It so happens that in the present C code they have the same static type so > the host compiler doesn't give an error for this comparison - but it is > still confused, and you need to arrange the code so it doesn't try to mix > different types like this. This means having a very clear notion of > whether a particular variable counts columns, bytes or characters, and of > how you do explicit computations of the number of each kind of object in a > particular part of a string, and no comparisons between counts of > different things. Thanks. For the number of bytes of a wide character is different with the display column width, I have always tried to find the relationship between the display column width with justification width that printf counted by bytes. This new patch fixed the function get_col_width totally. It is to return the really field width when left-justify print the wide character strings. The design splits the width to two parts. One is to print the strings, and the other is to print the spaces when the wide character display width less than LEFT_COLUMN. Composed the two parts would get the really width counted by bytes that can be used by printf. The print of help strings looks very fine. Is it ok? Thanks Pearly [-- Attachment #2: 42686.patch --] [-- Type: text/x-patch, Size: 6137 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157401) +++ intl.h (working copy) @@ -30,6 +30,9 @@ #include <libintl.h> extern void gcc_init_libintl (void); extern size_t gcc_gettext_width (const char *); +extern unsigned int get_col_width (const char *, unsigned int); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); #else /* Stubs. */ # undef textdomain @@ -41,6 +44,8 @@ extern size_t gcc_gettext_width (const c # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n) # define gcc_init_libintl() /* nothing */ # define gcc_gettext_width(s) strlen(s) +# define get_col_width(s, room) room +# define get_aligned_len(s, room, remaining) remaining extern const char *fake_ngettext(const char *singular,const char *plural, unsigned long int n); Index: intl.c =================================================================== --- intl.c (revision 157401) +++ intl.c (working copy) @@ -92,6 +92,7 @@ gcc_init_libintl (void) #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,84 @@ gcc_gettext_width (const char *msgstr) return wcswidth (wmsgstr, nwcs); } +/* Return the number of bytes when converts the wide character WC + to its multibyte representation. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *)xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Return the really width needed when left-justify print + the MSGSTR which came from gettext with the column width ROOM. */ + +unsigned int +get_col_width (const char *msgstr, unsigned int room) +{ + unsigned int width = gcc_gettext_width (msgstr); + if (width < room) + /* The (room -width) is the number spaces to print for aligned. */ + width = strlen (msgstr) + (room - width); + return width; +} + +/* Return the number of bytes of MSGSTR that can be printed within ROOM. + REMAINING is the overall length of MSTSTR. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = 0; + unsigned int pos = 0; + unsigned int wc_num = 0; + int nbytes; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + + for (i = 0; i < nwcs; i++) + { + unsigned int maybe_len = len + (i - pos); + nbytes = wctomb_nbytes (wmsgstr [i]); + if (maybe_len == remaining) + len = maybe_len; + else if (maybe_len >= room) + break; + else if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + nbytes = wctomb_nbytes (wmsgstr [i]); + if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + } + else + len = len + 1; + } + pos = i; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -117,6 +196,19 @@ gcc_gettext_width (const char *msgstr) return strlen (msgstr); } +unsigned int +get_col_width (const char *msgstr, unsigned int room) +{ + return room; +} + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return remaining; +} + #endif #endif /* ENABLE_NLS */ Index: opts.c =================================================================== --- opts.c (revision 157401) +++ opts.c (working copy) @@ -1157,11 +1157,16 @@ wrap_help (const char *help, unsigned int item_width, unsigned int columns) { - unsigned int col_width = LEFT_COLUMN; + unsigned int col_width; unsigned int remaining, room, len; remaining = strlen (help); + if (item_width == gcc_gettext_width (item)) + col_width = LEFT_COLUMN; + else + col_width = get_col_width (item, LEFT_COLUMN); + do { room = columns - 3 - MAX (col_width, item_width); @@ -1173,20 +1178,27 @@ wrap_help (const char *help, { unsigned int i; - for (i = 0; help[i]; i++) + if (len != gcc_gettext_width (help)) + { + len = get_aligned_len (help, room, remaining); + } + else { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; + for (i = 0; help[i]; i++) + { + if (i >= room && len != remaining) + break; + if (help[i] == ' ') + len = i; + else if ((help[i] == '-' || help[i] == '/') + && help[i + 1] != ' ' + && i > 0 && ISALPHA (help[i - 1])) + len = i + 1; + } } } - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); + col_width = LEFT_COLUMN; item_width = 0; while (help[len] == ' ') len++; @@ -1242,6 +1254,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1291,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *)alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-12 11:34 ` Shujing Zhao @ 2010-03-12 17:09 ` Joseph S. Myers 2010-03-15 9:28 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-12 17:09 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Fri, 12 Mar 2010, Shujing Zhao wrote: > +/* Return the really width needed when left-justify print > + the MSGSTR which came from gettext with the column width ROOM. */ I still can't make sense of this comment, and can't review the patch without clear English comments explaining the semantics of the various functions. What is "the really width"? Needed for that? What is "when left-justify print"? What is "column width ROOM"? If this is something to do with the peculiarities of printf functions (counting bytes, when you are concerned with columns for alignment), you might wish to consider simply not using printf width functionality with strings that may contain non-1-column characters. Output a certain number of bytes of the help text in one call, and a certain number of spaces in another call, and include clear comments explaining the printf peculiarities that are the reason for doing so. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-12 17:09 ` Joseph S. Myers @ 2010-03-15 9:28 ` Shujing Zhao 2010-03-15 10:58 ` Shujing Zhao 2010-03-15 11:42 ` Joseph S. Myers 0 siblings, 2 replies; 20+ messages in thread From: Shujing Zhao @ 2010-03-15 9:28 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 1428 bytes --] On 03/13/2010 12:59 AM, Joseph S. Myers wrote: > On Fri, 12 Mar 2010, Shujing Zhao wrote: > >> +/* Return the really width needed when left-justify print >> + the MSGSTR which came from gettext with the column width ROOM. */ > > I still can't make sense of this comment, and can't review the patch > without clear English comments explaining the semantics of the various > functions. What is "the really width"? Needed for that? What is "when > left-justify print"? What is "column width ROOM"? > yes, it looks quite confused now... > If this is something to do with the peculiarities of printf functions > (counting bytes, when you are concerned with columns for alignment), you > might wish to consider simply not using printf width functionality with > strings that may contain non-1-column characters. Output a certain number > of bytes of the help text in one call, and a certain number of spaces in > another call, and include clear comments explaining the printf > peculiarities that are the reason for doing so. > Yes. Breaking the printf makes the code clearer and easier. Thanks to point out it. At the new patch, I broke the printf into output item and output help string. The code to output the item that includes wide character is moved back to opts.c, for it doesn't use any wcs functions. The others are kept with the old patch. Thanks for your patience reviewing the patches. Is it OK? [-- Attachment #2: 03151650.patch --] [-- Type: text/x-patch, Size: 6061 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157453) +++ intl.h (working copy) @@ -30,6 +30,8 @@ #include <libintl.h> extern void gcc_init_libintl (void); extern size_t gcc_gettext_width (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); #else /* Stubs. */ # undef textdomain @@ -41,6 +43,7 @@ extern size_t gcc_gettext_width (const c # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n) # define gcc_init_libintl() /* nothing */ # define gcc_gettext_width(s) strlen(s) +# define get_aligned_len(s, room, remaining) remaining extern const char *fake_ngettext(const char *singular,const char *plural, unsigned long int n); Index: intl.c =================================================================== --- intl.c (revision 157453) +++ intl.c (working copy) @@ -92,6 +92,7 @@ gcc_init_libintl (void) #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,70 @@ gcc_gettext_width (const char *msgstr) return wcswidth (wmsgstr, nwcs); } +/* Return the number of bytes when converts the wide character WC + to its multibyte representation. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *)xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Return the length of MSGSTR that can be printed within the width ROOM. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = 0; + unsigned int pos = 0; + unsigned int wc_num = 0; + int nbytes; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + + for (i = 0; i < nwcs; i++) + { + unsigned int maybe_len = len + (i - pos); + nbytes = wctomb_nbytes (wmsgstr [i]); + if (maybe_len == remaining) + len = maybe_len; + else if (maybe_len >= room) + break; + else if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + nbytes = wctomb_nbytes (wmsgstr [i]); + if (nbytes > 1) + { + wc_num++; + len = wc_num * nbytes + (i + 1 - wc_num); + room = room + (nbytes - wcwidth (wmsgstr [i])); + } + else + len = len + 1; + } + pos = i; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -117,6 +182,13 @@ gcc_gettext_width (const char *msgstr) return strlen (msgstr); } +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return remaining; +} + #endif #endif /* ENABLE_NLS */ Index: opts.c =================================================================== --- opts.c (revision 157453) +++ opts.c (working copy) @@ -1158,12 +1158,31 @@ wrap_help (const char *help, unsigned int columns) { unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int remaining, room, len, item_col_width; + unsigned int i; remaining = strlen (help); + item_col_width = gcc_gettext_width (item); do { + if (item_width == item_col_width + || item_width == 0) + printf (" %-*.*s", col_width, item_width, item); + /* Handle the item that includes wide character. */ + else + { + printf (" %s", item); + /* Output a certain number of spaces for alignment. */ + if (item_col_width < col_width) + { + unsigned int n_spaces = col_width - item_col_width; + const char *space = " "; + for (i = 0; i < n_spaces; i++) + printf ("%s", space); + } + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; @@ -1171,22 +1190,25 @@ wrap_help (const char *help, if (room < len) { - unsigned int i; - - for (i = 0; help[i]; i++) + if (len != gcc_gettext_width (help)) + len = get_aligned_len (help, room, remaining); + else { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; + for (i = 0; help[i]; i++) + { + if (i >= room && len != remaining) + break; + if (help[i] == ' ') + len = i; + else if ((help[i] == '-' || help[i] == '/') + && help[i + 1] != ' ' + && i > 0 && ISALPHA (help[i - 1])) + len = i + 1; + } } } - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); + printf ( " %.*s\n", len, help); item_width = 0; while (help[len] == ' ') len++; @@ -1242,6 +1264,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1301,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *)alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-15 9:28 ` Shujing Zhao @ 2010-03-15 10:58 ` Shujing Zhao 2010-03-15 11:42 ` Joseph S. Myers 1 sibling, 0 replies; 20+ messages in thread From: Shujing Zhao @ 2010-03-15 10:58 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 651 bytes --] On 03/15/2010 05:13 PM, Shujing Zhao wrote: > Yes. Breaking the printf makes the code clearer and easier. Thanks to > point out it. At the new patch, I broke the printf into output item and > output help string. The code to output the item that includes wide > character is moved back to opts.c, for it doesn't use any wcs functions. > The others are kept with the old patch. > Thanks for your patience reviewing the patches. I reconsider the design of get_aligned_len after reading some material of line break at gnulib. And get_aligned_len should be changed to keep with the change of output item at the last patch. Is it ok? Thanks Pearly [-- Attachment #2: 03151742.patch --] [-- Type: text/x-patch, Size: 6155 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157453) +++ intl.h (working copy) @@ -30,6 +30,8 @@ #include <libintl.h> extern void gcc_init_libintl (void); extern size_t gcc_gettext_width (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); #else /* Stubs. */ # undef textdomain @@ -41,6 +43,7 @@ # define ngettext(singular,plural,n) fake_ngettext(singular,plural,n) # define gcc_init_libintl() /* nothing */ # define gcc_gettext_width(s) strlen(s) +# define get_aligned_len (s, room, remaining) remaining extern const char *fake_ngettext(const char *singular,const char *plural, unsigned long int n); Index: intl.c =================================================================== --- intl.c (revision 157453) +++ intl.c (working copy) @@ -92,6 +92,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,87 @@ return wcswidth (wmsgstr, nwcs); } +/* Return the number of bytes when converts the wide character WC + to its multibyte representation. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *)xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Return the length of MSGSTR that can be printed within the width ROOM. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = 0; + unsigned int pos = 0; + int nbytes; + int len_bk; + wchar_t *buf; + unsigned int colwide = 0; + size_t remaining_width; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + remaining_width = wcswidth (wmsgstr, nwcs); + + for (i = 0; i < nwcs; i++) + { + unsigned int maybe_width = colwide + (i - pos); + nbytes = wctomb_nbytes (wmsgstr [i]); + if (maybe_width == remaining_width) + len = remaining; + else if (colwide > room) + { + len = len_bk; + break; + } + else if (colwide == room || maybe_width > room) + break; + else if (nbytes > 1) + { + buf = XALLOCAVEC (wchar_t, i + 2); + wcsncpy (buf, wmsgstr, i + 1); + buf [i + 1] = L'\0'; + len_bk = len; + len = wcstombs (0, buf, 0); + colwide = wcswidth (buf, i + 1); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + nbytes = wctomb_nbytes (wmsgstr [i]); + if (nbytes > 1) + { + buf = XALLOCAVEC (wchar_t, i + 2); + wcsncpy (buf, wmsgstr, i + 1); + buf [i + 1] = '\0'; + len = wcstombs (0, buf, 0); + colwide = wcswidth (buf, i + 1); + } + else + { + len = len + 1; + colwide = colwide + 1; + } + } + pos = i; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -117,6 +199,13 @@ return strlen (msgstr); } +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return remaining; +} + #endif #endif /* ENABLE_NLS */ Index: opts.c =================================================================== --- opts.c (revision 157453) +++ opts.c (working copy) @@ -1158,12 +1158,29 @@ unsigned int columns) { unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int remaining, room, len, item_col_width; + unsigned int i; remaining = strlen (help); + item_col_width = gcc_gettext_width (item); do { + if (item_width == item_col_width + || item_width == 0) + printf (" %-*.*s", col_width, item_width, item); + else + { + printf (" %s", item); + if (item_col_width < col_width) + { + unsigned int n_spaces = col_width - item_col_width; + const char *space = " "; + for (i = 0; i < n_spaces; i++) + printf ("%s", space); + } + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; @@ -1171,22 +1188,25 @@ if (room < len) { - unsigned int i; - - for (i = 0; help[i]; i++) + if (len != gcc_gettext_width (help)) + len = get_aligned_len (help, room, remaining); + else { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; + for (i = 0; help[i]; i++) + { + if (i >= room && len != remaining) + break; + if (help[i] == ' ') + len = i; + else if ((help[i] == '-' || help[i] == '/') + && help[i + 1] != ' ' + && i > 0 && ISALPHA (help[i - 1])) + len = i + 1; + } } } - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); + printf ( " %.*s\n", len, help); item_width = 0; while (help[len] == ' ') len++; @@ -1242,6 +1262,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1299,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *)alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-15 9:28 ` Shujing Zhao 2010-03-15 10:58 ` Shujing Zhao @ 2010-03-15 11:42 ` Joseph S. Myers 2010-03-17 7:57 ` Shujing Zhao 1 sibling, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-15 11:42 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Mon, 15 Mar 2010, Shujing Zhao wrote: > +/* Return the number of bytes when converts the wide character WC > + to its multibyte representation. */ "Return the number of bytes in the multibyte representation of the wide character WC.". > + char *pmb = (char *)xmalloc (MB_CUR_MAX); Missing space between "(char *)" and "xmalloc". > +/* Return the length of MSGSTR that can be printed within the width ROOM. */ > + > +unsigned int > +get_aligned_len (const char *msgstr, unsigned int room, > + unsigned int remaining) You need to say what REMAINING is. > @@ -1158,12 +1158,31 @@ wrap_help (const char *help, The comment at the top of this function should be changed to make explicit that ITEM_WIDTH is a count of bytes (if that is indeed the case). > + if (item_width == item_col_width > + || item_width == 0) > + printf (" %-*.*s", col_width, item_width, item); > + /* Handle the item that includes wide character. */ > + else I think the code would be a lot clearer with fewer conditionals. Why do you need to handle specially the case where item_width == item_col_width? Can't you just use the logic below unconditionally? > + { > + printf (" %s", item); > + /* Output a certain number of spaces for alignment. */ > + if (item_col_width < col_width) > + { > + unsigned int n_spaces = col_width - item_col_width; > + const char *space = " "; > + for (i = 0; i < n_spaces; i++) > + printf ("%s", space); You don't actually need a loop here (given a number of spaces representable as an "int", which I think you can assume); you can use "%*s" as a format with an empty string argument to have printf output the number of spaces you computed. > + if (len != gcc_gettext_width (help)) > + len = get_aligned_len (help, room, remaining); Again, I'd like to avoid conditionals if possible. > + for (i = 0; help[i]; i++) > + { > + if (i >= room && len != remaining) > + break; > + if (help[i] == ' ') > + len = i; > + else if ((help[i] == '-' || help[i] == '/') > + && help[i + 1] != ' ' > + && i > 0 && ISALPHA (help[i - 1])) > + len = i + 1; > + } You now have two copies of line-splitting logic, one in get_aligned_len and one in opts.c. It would be better for the logic to be just in get_aligned_len. You might still need two copies - the one using wide character functions and a simpler one where they aren't available - but having both in the same file would be better than sometimes using logic from one file and sometimes from another file. (This would mean that intl.h no longer has a dummy definition of get_aligned_len; there would always be a real function that works out a suitable point to break a line.) > + buf = (char *)alloca (len + 1); Missing space in cast again. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-15 11:42 ` Joseph S. Myers @ 2010-03-17 7:57 ` Shujing Zhao 2010-03-17 12:50 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-17 7:57 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 3748 bytes --] On 03/15/2010 07:03 PM, Joseph S. Myers wrote: > > "Return the number of bytes in the multibyte representation of the wide > character WC.". > > > Missing space between "(char *)" and "xmalloc". > > > You need to say what REMAINING is. > >> @@ -1158,12 +1158,31 @@ wrap_help (const char *help, > > The comment at the top of this function should be changed to make explicit > that ITEM_WIDTH is a count of bytes (if that is indeed the case). Fixed at the updated patch. Thanks. > >> + if (item_width == item_col_width >> + || item_width == 0) >> + printf (" %-*.*s", col_width, item_width, item); >> + /* Handle the item that includes wide character. */ >> + else > > I think the code would be a lot clearer with fewer conditionals. Why do > you need to handle specially the case where item_width == item_col_width? > Can't you just use the logic below unconditionally? The conditional item_width == item_col_width is used to handle the ITEM without wide characters. As your suggestion, I simplify the logic as + int n_spaces = item_width <= item_col_width + ? col_width - item_width + : col_width - item_col_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } The conditional item_width <= item_col_width is still used to distinguish the ITEM with or without wide character. And it is still print the item and spaces respectively. Is it better? > >> + for (i = 0; i < n_spaces; i++) >> + printf ("%s", space); > > You don't actually need a loop here (given a number of spaces > representable as an "int", which I think you can assume); you can use > "%*s" as a format with an empty string argument to have printf output the > number of spaces you computed. Thanks. Fixed. > >> + if (len != gcc_gettext_width (help)) >> + len = get_aligned_len (help, room, remaining); > > Again, I'd like to avoid conditionals if possible. After the logic is moved to intl.c, the conditional is still kept to tell calling get_str_aligned_len or get_wcs_aligned_len. + if (gcc_gettext_width (msgstr) == remaining) + return get_str_aligned_len (msgstr, room, remaining); + else + return get_wcs_aligned_len (msgstr, room, remaining); Is it ok or any suggestion? > > > You now have two copies of line-splitting logic, one in get_aligned_len > and one in opts.c. It would be better for the logic to be just in > get_aligned_len. You might still need two copies - the one using wide > character functions and a simpler one where they aren't available - but > having both in the same file would be better than sometimes using logic > from one file and sometimes from another file. (This would mean that > intl.h no longer has a dummy definition of get_aligned_len; there would > always be a real function that works out a suitable point to break a > line.) > Move the two logics to intl.c and adjust the definition of get_aligned_len. Define function get_wcs_aligned_len to handle the wide character strings and fall back to single characters if --disable-nls or the relevant interfaces are not available. >> + buf = (char *)alloca (len + 1); > > Missing space in cast again. > Fixed. The algorithm of function get_wcs_aligned_len (the old get_aligned_len) is adjusted too. The ROOM is not be "expanded". It is counting the column width of the sub wide character string that would be printed, and then comparing with the ROOM it is given to know if the string is enough long to put in the ROOM. The wide character space is also added to check to break the line. Thanks Pearly [-- Attachment #2: 03171200.patch --] [-- Type: text/x-patch, Size: 7685 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157453) +++ intl.h (working copy) @@ -60,6 +60,8 @@ #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157453) +++ intl.c (working copy) @@ -40,6 +40,9 @@ /* Whether the locale is using UTF-8. */ bool locale_utf8 = false; +static unsigned int get_str_aligned_len (const char *, unsigned int, + unsigned int); + #ifdef ENABLE_NLS /* Initialize the translation library for GCC. This performs the @@ -92,6 +95,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +110,108 @@ return wcswidth (wmsgstr, nwcs); } +static wchar_t * +get_sub_wstr (wchar_t *wstr, int len) +{ + wchar_t *buf; + buf = XNEWVEC (wchar_t, len + 1); + wcsncpy (buf, wstr, len); + buf [len] = L'\0'; + return buf; +} + +/* Return the number of bytes in the multibyte representation of the wide + character WC. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *) xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Return the length of multibyte string MSGSTR that can be printed within the + width ROOM. REMAINING is the length of string MSGSTR counted by bytes. */ + +static unsigned int +get_wcs_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + int nbytes; + wchar_t *sub_wstr; + unsigned int i; + unsigned int maybe_width; + size_t remaining_width; + + unsigned int len = 0; + unsigned int pos = 0; + unsigned int colwidth = 0; + + int len_bk = len; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + remaining_width = wcswidth (wmsgstr, nwcs); + + for (i = 0; i < nwcs; i++) + { + nbytes = wctomb_nbytes (wmsgstr [i]); + maybe_width = colwidth + i - pos; + if (maybe_width == remaining_width) + len = remaining; + else if (colwidth == room || maybe_width >= room) + break; + else if (colwidth > room) + { + len = len_bk; + break; + } + else if (iswspace (wmsgstr [i])) + { + sub_wstr = get_sub_wstr (wmsgstr, i); + len_bk = len; + len = wcstombs (0, sub_wstr, 0); + colwidth = maybe_width; + pos = i; + free (sub_wstr); + } + else if (nbytes > 1) + { + sub_wstr = get_sub_wstr (wmsgstr, i + 1); + len_bk = len; + len = wcstombs (0, sub_wstr, 0); + colwidth = wcswidth (sub_wstr, i + 1); + free (sub_wstr); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + nbytes = wctomb_nbytes (wmsgstr [i]); + if (nbytes > 1) + { + sub_wstr = get_sub_wstr (wmsgstr, i + 1); + len = wcstombs (0, sub_wstr, 0); + colwidth = wcswidth (sub_wstr, i + 1); + free (sub_wstr); + } + else + { + len = len + 1; + colwidth = colwidth + 1; + } + } + pos = i; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -117,6 +223,13 @@ return strlen (msgstr); } +static unsigned int +get_wcs_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return get_str_aligned_len (msgstr, room, remaining); +} + #endif #endif /* ENABLE_NLS */ @@ -132,6 +245,13 @@ return plural; } +static unsigned int +get_wcs_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + return get_str_aligned_len (msgstr, room, remaining); +} + #endif /* Return the indent for successive lines, using the width of @@ -148,5 +268,36 @@ return spaces; } +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ +static unsigned int +get_str_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + if (gcc_gettext_width (msgstr) == remaining) + return get_str_aligned_len (msgstr, room, remaining); + else + return get_wcs_aligned_len (msgstr, room, remaining); +} + Index: opts.c =================================================================== --- opts.c (revision 157453) +++ opts.c (working copy) @@ -1149,7 +1149,7 @@ #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, +/* Output ITEM, of length ITEM_WIDTH counted by bytes, in the left column, followed by word-wrapped HELP in a second column. */ static void wrap_help (const char *help, @@ -1158,36 +1158,34 @@ unsigned int columns) { unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int remaining, room, len, item_col_width; remaining = strlen (help); + item_col_width = gcc_gettext_width (item); do { + int n_spaces = item_width <= item_col_width + ? col_width - item_width + : col_width - item_col_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; + len = get_aligned_len (help, room, remaining); - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } - - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1242,6 +1240,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1277,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-17 7:57 ` Shujing Zhao @ 2010-03-17 12:50 ` Joseph S. Myers 2010-03-18 11:48 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-17 12:50 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Wed, 17 Mar 2010, Shujing Zhao wrote: > > > + if (item_width == item_col_width > > > + || item_width == 0) > > > + printf (" %-*.*s", col_width, item_width, item); > > > + /* Handle the item that includes wide character. */ > > > + else > > > > I think the code would be a lot clearer with fewer conditionals. Why do you > > need to handle specially the case where item_width == item_col_width? Can't > > you just use the logic below unconditionally? > The conditional item_width == item_col_width is used to handle the ITEM > without wide characters. As your suggestion, I simplify the logic as But you should be able to treat all ITEMs as if they have wide characters; there should be no need to do something special for the case of no wide characters. > > > + if (len != gcc_gettext_width (help)) > > > + len = get_aligned_len (help, room, remaining); > > > > Again, I'd like to avoid conditionals if possible. > After the logic is moved to intl.c, the conditional is still kept to tell > calling get_str_aligned_len or get_wcs_aligned_len. > > + if (gcc_gettext_width (msgstr) == remaining) > + return get_str_aligned_len (msgstr, room, remaining); > + else > + return get_wcs_aligned_len (msgstr, room, remaining); > > Is it ok or any suggestion? The check still isn't appropriate. The width might equal the number of bytes even when wide characters are present. I think the logic should be: if NLS enabled and the relevant wide character functions available, use get_wcs_aligned_len, otherwise use get_str_aligned_len. This means only one of those functions is used in any compiler configuration, so you can just make them two different implementations of get_aligned_len depending on the configuration, rather than having three function names involved. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-17 12:50 ` Joseph S. Myers @ 2010-03-18 11:48 ` Shujing Zhao 2010-03-18 17:54 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-18 11:48 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 1384 bytes --] On 03/17/2010 08:04 PM, Joseph S. Myers wrote: > > But you should be able to treat all ITEMs as if they have wide characters; > there should be no need to do something special for the case of no wide > characters. > > The check still isn't appropriate. The width might equal the number of > bytes even when wide characters are present. I think the logic should be: > if NLS enabled and the relevant wide character functions available, use > get_wcs_aligned_len, otherwise use get_str_aligned_len. This means only > one of those functions is used in any compiler configuration, so you can > just make them two different implementations of get_aligned_len depending > on the configuration, rather than having three function names involved. > Thanks. I merge the algorithm of line break in function get_aligned_len and don't care if it includes wide characters. Yes, not only the code, but also the logic is clearer. I add a separate conditional directive to get the aligned length when --diable-nls or the relevant interfaces are not available. The definition of function fake_ngettext is moved to the inclusions of ENABLE_NLS, not used a separate "ifndef ENABLE_NLS". That would use less conditionals at intl.c. Tested on i686-pc-linux and enabled nls, the help output isn't changed when the locale is C. Disabled nls, the output isn't changed too. Thanks Pearly [-- Attachment #2: 03181811.patch --] [-- Type: text/x-patch, Size: 6917 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157508) +++ intl.h (working copy) @@ -60,6 +60,8 @@ #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157508) +++ intl.c (working copy) @@ -92,6 +92,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,102 @@ return wcswidth (wmsgstr, nwcs); } +static wchar_t * +get_sub_wstr (wchar_t *wstr, int len) +{ + wchar_t *buf; + buf = XNEWVEC (wchar_t, len + 1); + wcsncpy (buf, wstr, len); + buf [len] = L'\0'; + return buf; +} + +/* Return the number of bytes in the multibyte representation of the wide + character WC. */ + +static int +wctomb_nbytes (wchar_t wc) +{ + char *pmb = (char *) xmalloc (MB_CUR_MAX); + int nbytes = wctomb (pmb, wc); + if (pmb) + free (pmb); + return nbytes; +} + +/* Return the length of multibyte string MSGSTR that can be printed within the + width ROOM. REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + int nbytes; + wchar_t *sub_wstr; + unsigned int i; + size_t remaining_width; + + unsigned int len = 0; + unsigned int colwidth = 0; + + int len_bk; + + size_t nwcs = mbstowcs (0, msgstr, 0); + wchar_t *wmsgstr = XALLOCAVEC (wchar_t, nwcs + 1); + + mbstowcs (wmsgstr, msgstr, nwcs + 1); + remaining_width = wcswidth (wmsgstr, nwcs); + + for (i = 0; i < nwcs; i++) + { + nbytes = wctomb_nbytes (wmsgstr [i]); + colwidth ++; + + if (colwidth == remaining_width) + len = remaining; + else if (iswspace (wmsgstr [i])) + { + sub_wstr = get_sub_wstr (wmsgstr, i); + len_bk = len = wcstombs (0, sub_wstr, 0); + free (sub_wstr); + } + else if ((wmsgstr[i] == L'-' || wmsgstr[i] == L'/') + && msgstr[i + 1] != L' ' + && i > 0 && iswalpha (msgstr[i - 1])) + { + sub_wstr = get_sub_wstr (wmsgstr, i + 1); + len_bk = len = wcstombs (0, sub_wstr, 0); + colwidth = wcswidth (sub_wstr, i + 1); + free (sub_wstr); + } + else if (nbytes > 1) + { + sub_wstr = get_sub_wstr (wmsgstr, i + 1); + len_bk = len; + len = wcstombs (0, sub_wstr, 0); + colwidth = wcswidth (sub_wstr, i + 1); + free (sub_wstr); + /* Keep the puncts printed with the previous word. */ + if (iswpunct (wmsgstr [i + 1]) + && iswalpha (wmsgstr [i -1])) + { + i++; + sub_wstr = get_sub_wstr (wmsgstr, i + 1); + len = wcstombs (0, sub_wstr, 0); + colwidth = wcswidth (sub_wstr, i + 1); + free (sub_wstr); + } + } + if (colwidth >= room) + { + if (colwidth > room) + len = len_bk; + break; + } + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,10 +216,8 @@ #endif -#endif /* ENABLE_NLS */ +#else /* Not defined ENABLE_NLS */ -#ifndef ENABLE_NLS - const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) { @@ -132,6 +227,35 @@ return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 157508) +++ opts.c (working copy) @@ -1149,7 +1149,7 @@ #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, +/* Output ITEM, of length ITEM_WIDTH counted by bytes, in the left column, followed by word-wrapped HELP in a second column. */ static void wrap_help (const char *help, @@ -1158,36 +1158,34 @@ unsigned int columns) { unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int remaining, room, len, item_col_width; remaining = strlen (help); + item_col_width = gcc_gettext_width (item); do { + int n_spaces = item_width <= item_col_width + ? col_width - item_width + : col_width - item_col_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; + len = get_aligned_len (help, room, remaining); - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } - - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1242,6 +1240,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1277,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-18 11:48 ` Shujing Zhao @ 2010-03-18 17:54 ` Joseph S. Myers 2010-03-19 11:43 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-18 17:54 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Thu, 18 Mar 2010, Shujing Zhao wrote: > > +static wchar_t * > +get_sub_wstr (wchar_t *wstr, int len) Needs comment explaining semantics of parameters and return value. > +/* Return the number of bytes in the multibyte representation of the wide > + character WC. */ > + > +static int > +wctomb_nbytes (wchar_t wc) > +{ This code is heavily assuming that converting from multibyte to wide characters and back again yields exactly the same multibyte characters. This is more likely in the Unicode world than it used to be, but it's not clear we should be presuming it. I think code without this presumption would be simpler than code with it; rather than converting the whole string at once and then repeatedly extracting substrings, you'd convert one character at a time, keeping track of the width at each point (computing the width of one character at a time) and of the byte count to the last possible point for a line break. (That way, you could also easily allow for encodings with shift sequences, keeping an mbstate_t in the course of processing the string.) > +/* Return the length of multibyte string MSGSTR that can be printed within the "length" = number of bytes? > + { > + nbytes = wctomb_nbytes (wmsgstr [i]); > + colwidth ++; Why this increment? Characters may have width 0 (combining characters) or more than 1 and it's not clear why adding 1 should be the correct thing to do here. > + else if (nbytes > 1) Why this conditional? Whatever the number of bytes is, you need to update the width appropriately. > + { > + if (colwidth > room) > + len = len_bk; > + break; Indentation seems very confused. Make sure that new code consistently uses TABs for indenting source code lines. > + int n_spaces = item_width <= item_col_width > + ? col_width - item_width > + : col_width - item_col_width; I don't see why this conditional is here. For printing spaces, it's always the number of columns that's relevant; the number of bytes is irrelevant. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-18 17:54 ` Joseph S. Myers @ 2010-03-19 11:43 ` Shujing Zhao 2010-03-19 13:44 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-19 11:43 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 2959 bytes --] On 03/19/2010 01:03 AM, Joseph S. Myers wrote: > This code is heavily assuming that converting from multibyte to wide > characters and back again yields exactly the same multibyte characters. > This is more likely in the Unicode world than it used to be, but it's not > clear we should be presuming it. I think code without this presumption > would be simpler than code with it; rather than converting the whole > string at once and then repeatedly extracting substrings, you'd convert > one character at a time, keeping track of the width at each point > (computing the width of one character at a time) and of the byte count to > the last possible point for a line break. (That way, you could also > easily allow for encodings with shift sequences, keeping an mbstate_t in > the course of processing the string.) Yes! Thanks again! The source codes are clearer and simpler while choose your method to convert one character at a time. All static functions I have added are removed. Only the main function get_aligned_len are left. >> + { >> + nbytes = wctomb_nbytes (wmsgstr [i]); >> + colwidth ++; > > Why this increment? Characters may have width 0 (combining characters) or > more than 1 and it's not clear why adding 1 should be the correct thing to > do here. It is changed to *length--* at the new logic. + colwidth += wcwidth (wc[0]); + length--; + if (length == 0) + len = remaining; I have tried to used length = length - nbytes to get the new length. But if nbytes > 1 and the character is the last character of the string (length will be 0 if it decreases nbytes), it would not assign the last variable *len* to *len_bk*, then the character that maybe put into the ROOM would not be printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it would be handled specially (just to update the value of len_bk). At the other conditionals, len_bk is always kept consistently with len. > >> + else if (nbytes > 1) > > Why this conditional? Whatever the number of bytes is, you need to update > the width appropriately. At the new logic, the width is always be updated every time when convert one character. > > Indentation seems very confused. Make sure that new code consistently > uses TABs for indenting source code lines. Thanks again. All the lines of the updated patch is checked to use TABs for indenting. > >> + int n_spaces = item_width <= item_col_width >> + ? col_width - item_width >> + : col_width - item_col_width; > > I don't see why this conditional is here. For printing spaces, it's > always the number of columns that's relevant; the number of bytes is > irrelevant. Yes! You are right! The gcc_gettext_width would always return the width whatever the string is. It seems the old argument ITEM_WIDTH is not needed anymore. I removed it from the function wrap_help. Is it ok? Thanks Pearly [-- Attachment #2: 03191800.patch --] [-- Type: text/x-patch, Size: 6082 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157508) +++ intl.h (working copy) @@ -60,6 +60,8 @@ #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157508) +++ intl.c (working copy) @@ -92,6 +92,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,73 @@ return wcswidth (wmsgstr, nwcs); } +/* Return the number of bytes of multibyte string MSGSTR + that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + size_t nbytes; + wchar_t wc[1]; + + size_t pre_nbytes; + wchar_t pre_wc; + + unsigned int len = 0; + unsigned int colwidth = 0; + unsigned int len_bk; + + mbstate_t state; + memset (&state, '\0', sizeof (state)); + + int length = remaining; + + while ((nbytes = mbrtowc (wc, msgstr, length, &state)) > 0) + { + if (nbytes >= (size_t) -2) + break; + colwidth += wcwidth (wc[0]); + length--; + if (length == 0) + len = remaining; + else if (iswspace (wc[0])) + len_bk = len = remaining - length - nbytes; + else if ((wc[0] == L'-' || wc[0] == L'/') + && iswalpha (pre_wc)) + len_bk = len = remaining - length; + else if (nbytes > 1) + { + len_bk = len; + length = length - nbytes + 1; + len = remaining - length; + pre_wc = wc[0]; + pre_nbytes = nbytes; + if ((nbytes = mbrtowc (wc, msgstr + nbytes, length, &state)) > 0) + { + if (iswpunct (wc[0]) && iswalpha (pre_wc)) + { + len = remaining - length + nbytes; + colwidth += wcwidth(wc[0]); + msgstr += nbytes; + length -= nbytes; + } + nbytes = pre_nbytes; + } + } + if (colwidth >= room) + { + if (colwidth > room) + len = len_bk; + break; + } + msgstr += nbytes; + pre_wc = wc[0]; + } + return len; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,10 +187,8 @@ #endif -#endif /* ENABLE_NLS */ +#else /* Not defined ENABLE_NLS */ -#ifndef ENABLE_NLS - const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) { @@ -132,6 +198,35 @@ return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 157508) +++ opts.c (working copy) @@ -1149,45 +1149,40 @@ #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, - followed by word-wrapped HELP in a second column. */ +/* Output ITEM in the left column, followed by + word-wrapped HELP in a second column. */ static void wrap_help (const char *help, const char *item, - unsigned int item_width, unsigned int columns) { - unsigned int col_width = LEFT_COLUMN; unsigned int remaining, room, len; + unsigned int col_width = LEFT_COLUMN; + unsigned int item_width = gcc_gettext_width (item); remaining = strlen (help); do { + int n_spaces = col_width - item_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; + len = get_aligned_len (help, room, remaining); - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } - - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1226,7 +1221,7 @@ /* Get the translation. */ help = _(help); - wrap_help (help, param, strlen (param), columns); + wrap_help (help, param, columns); } putchar ('\n'); return; @@ -1242,6 +1237,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1274,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else @@ -1319,7 +1318,7 @@ help = new_help; } - wrap_help (help, opt, len, columns); + wrap_help (help, opt, columns); displayed = true; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-19 11:43 ` Shujing Zhao @ 2010-03-19 13:44 ` Joseph S. Myers 2010-03-22 9:17 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-03-19 13:44 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Fri, 19 Mar 2010, Shujing Zhao wrote: > > Why this increment? Characters may have width 0 (combining characters) or > > more than 1 and it's not clear why adding 1 should be the correct thing to > > do here. > It is changed to *length--* at the new logic. > + colwidth += wcwidth (wc[0]); > + length--; > + if (length == 0) > + len = remaining; > I have tried to used length = length - nbytes to get the new length. But if > nbytes > 1 and the character is the last character of the string (length will > be 0 if it decreases nbytes), it would not assign the last variable *len* to > *len_bk*, then the character that maybe put into the ROOM would not be > printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it > would be handled specially (just to update the value of len_bk). At the other > conditionals, len_bk is always kept consistently with len. I'm afraid none of this makes any sense to me, because I have no idea what "length", "len" and "len_bk" are meant to be, and how they differ. You need clearer variable names. I think you also need comments at various points stating what the values various variables mean at those points, and maybe an overview comment about the algorithm used. It would be best to avoid ambiguous terms such as "length" at all points in variable names and comments, instead talking about numbers of bytes difference between two points, or similar. For example, the present algorithm in opts.c:wrap_help might be described thus: /* Look for a suitable point at which to break HELP. If there is one after at most ROOM columns, take the latest such point. Otherwise, take the first such point in the string, or the end of the string if it cannot be broken. */ If the new algorithm is still like that, you may just need to update variable names (and explain what the interface is for indicating both how much of the string to print, and how much (spaces) to skip before starting on the next line). The key variables to explain are LEN and I. At each iteration in the present loop, something like this might explain them. /* The best candidate point found so far to break HELP is after LEN bytes, where either LEN < I, or LEN == REMAINING if the string cannot be broken within the first I bytes. If we are outside ROOM columns and have found somewhere to break the string, that is the best point available; otherwise, breaking after I bytes will be better if that is a valid point to break. */ Now, things will be a bit more complicated in that you need to track the current column width as well as the current number of bytes of the string that take up that width. But I think the basic idea is still the same - it is just that "length", "len" and "len_bk" are not useful variable names to distinguish the different quantities being tracked and so make clear whether particular arithmetic you are doing on them makes logical sense. I remain very suspicious of anything modifying a byte counter by 1 in this context; if you are ever modifying a byte counter by something other than the number of bytes in the relevant multibyte character I think you are confused and need to sort out the logic of the code so you don't need to do so. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-19 13:44 ` Joseph S. Myers @ 2010-03-22 9:17 ` Shujing Zhao 2010-03-31 7:03 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-22 9:17 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 4299 bytes --] On 03/19/2010 09:26 PM, Joseph S. Myers wrote: > On Fri, 19 Mar 2010, Shujing Zhao wrote: > >>> Why this increment? Characters may have width 0 (combining characters) or >>> more than 1 and it's not clear why adding 1 should be the correct thing to >>> do here. >> It is changed to *length--* at the new logic. >> + colwidth += wcwidth (wc[0]); >> + length--; >> + if (length == 0) >> + len = remaining; >> I have tried to used length = length - nbytes to get the new length. But if >> nbytes > 1 and the character is the last character of the string (length will >> be 0 if it decreases nbytes), it would not assign the last variable *len* to >> *len_bk*, then the character that maybe put into the ROOM would not be >> printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it >> would be handled specially (just to update the value of len_bk). At the other >> conditionals, len_bk is always kept consistently with len. > > I'm afraid none of this makes any sense to me, because I have no idea what > "length", "len" and "len_bk" are meant to be, and how they differ. You > need clearer variable names. I think you also need comments at various > points stating what the values various variables mean at those points, and > maybe an overview comment about the algorithm used. It would be best to > avoid ambiguous terms such as "length" at all points in variable names and > comments, instead talking about numbers of bytes difference between two > points, or similar. > > For example, the present algorithm in opts.c:wrap_help might be described > thus: > > /* Look for a suitable point at which to break HELP. If there is one > after at most ROOM columns, take the latest such point. Otherwise, > take the first such point in the string, or the end of the string if it > cannot be broken. */ > > If the new algorithm is still like that, you may just need to update > variable names (and explain what the interface is for indicating both how > much of the string to print, and how much (spaces) to skip before starting > on the next line). > > The key variables to explain are LEN and I. At each iteration in the > present loop, something like this might explain them. > > /* The best candidate point found so far to break HELP is after LEN > bytes, where either LEN < I, or LEN == REMAINING if the string cannot > be broken within the first I bytes. If we are outside ROOM columns and > have found somewhere to break the string, that is the best point > available; otherwise, breaking after I bytes will be better if that is > a valid point to break. */ > > Now, things will be a bit more complicated in that you need to track the > current column width as well as the current number of bytes of the string > that take up that width. But I think the basic idea is still the same - > it is just that "length", "len" and "len_bk" are not useful variable names > to distinguish the different quantities being tracked and so make clear > whether particular arithmetic you are doing on them makes logical sense. Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of bytes that have been found can be printed at ROOM columns, "len" to "cur_len_to_print" as the possible number of bytes that can be printed at ROOM columns, "length" to left_len as the number of bytes that MSGSTR left to convert to wide character. Some comments are also added as your suggestion. > I remain very suspicious of anything modifying a byte counter by 1 in this > context; if you are ever modifying a byte counter by something other than > the number of bytes in the relevant multibyte character I think you are > confused and need to sort out the logic of the code so you don't need to > do so. Thanks. You are right. I found the way to solve the problem that I have mentioned when I explain why I have to use "length--". It just needs to store the previous point that can be break the string before the new point is set to the end of the string. + left_len -= nbytes; + + /* Compute the point to break the string. */ + if (left_len == 0) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining; + } Is it Ok? Thanks Pearly [-- Attachment #2: 0322.patch --] [-- Type: text/x-patch, Size: 7982 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157508) +++ intl.h (working copy) @@ -60,6 +60,8 @@ #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157508) +++ intl.c (working copy) @@ -92,6 +92,7 @@ #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,99 @@ return wcswidth (wmsgstr, nwcs); } +/* Look for a suitable point at which to break string MSGSTR. If there is one + after at most ROOM columns, take the latest such point. Otherwise, + take the first such point in the string, or the end of the string if it + cannot be broken. + REMAINING is the number of bytes of MSGSTR. + Return the number of bytes before the point. */ +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + size_t nbytes; + wchar_t wc[1]; + size_t pre_nbytes; + wchar_t pre_wc; + + unsigned int colwidth = 0; + + /* The number of bytes that have been found can be printed at ROOM columns. */ + unsigned int pre_len_to_print; + + /* The possible number of bytes that can be printed at ROOM columns. */ + unsigned int cur_len_to_print = 0; + + /* The number of bytes that MSGSTR left to convert to wide character. */ + int left_len = remaining; + + mbstate_t state; + memset (&state, '\0', sizeof (state)); + + /* Convert one character at a time, keep track of the width at each point + and of the byte count to the last possible point for a line break. The + best candidate point found so far to break MSGSTR is after the space, + some puncts or multibyte character with the number of bytes + CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere + PRE_LEN_TO_PRINT to break the string, that is the best point available; + otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that + is a valid point to break. */ + while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + /* Invalide input string. */ + break; + + colwidth += wcwidth (wc[0]); + left_len -= nbytes; + + /* Compute the point to break the string. */ + if (left_len == 0) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining; + } + else if (iswspace (wc[0])) + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes; + else if ((wc[0] == L'-' || wc[0] == L'/') + && iswalpha (pre_wc)) + pre_len_to_print = cur_len_to_print = remaining - left_len; + else if (nbytes > 1) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining - left_len; + pre_wc = wc[0]; + pre_nbytes = nbytes; + /* Keep the puncts printed with the previous word. */ + if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + break; + if (iswpunct (wc[0]) && iswalpha (pre_wc)) + { + cur_len_to_print = remaining - left_len + nbytes; + colwidth += wcwidth (wc[0]); + msgstr += nbytes; + left_len -= nbytes; + } + nbytes = pre_nbytes; + } + } + + if (colwidth >= room) + { + /* If the string is outside ROOM columns after counted the last + word, the break point will be set to the previous point it kept. */ + if (colwidth > room) + cur_len_to_print = pre_len_to_print; + break; + } + msgstr += nbytes; + pre_wc = wc[0]; + } + return cur_len_to_print; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,10 +213,8 @@ #endif -#endif /* ENABLE_NLS */ +#else /* Not defined ENABLE_NLS */ -#ifndef ENABLE_NLS - const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) { @@ -132,6 +224,35 @@ return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 157508) +++ opts.c (working copy) @@ -1149,45 +1149,45 @@ #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, - followed by word-wrapped HELP in a second column. */ +/* Output ITEM in the left column and HELP in the second column with the total + COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the + corresponding number of spaces for alignment. If the ROOM is less than + REMAINING, look for a suitable point at which to break HELP, Otherwise, take + take the end of the string. + If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in + the left column from the secode line. */ static void wrap_help (const char *help, const char *item, - unsigned int item_width, unsigned int columns) { + unsigned int remaining, room, len; unsigned int col_width = LEFT_COLUMN; - unsigned int remaining, room, len; + unsigned int item_width = gcc_gettext_width (item); remaining = strlen (help); do { + int n_spaces = col_width - item_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; + len = get_aligned_len (help, room, remaining); - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } - - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1226,7 +1226,7 @@ /* Get the translation. */ help = _(help); - wrap_help (help, param, strlen (param), columns); + wrap_help (help, param, columns); } putchar ('\n'); return; @@ -1242,6 +1242,7 @@ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1279,10 @@ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else @@ -1319,7 +1323,7 @@ help = new_help; } - wrap_help (help, opt, len, columns); + wrap_help (help, opt, columns); displayed = true; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-22 9:17 ` Shujing Zhao @ 2010-03-31 7:03 ` Shujing Zhao 2010-04-06 16:56 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-03-31 7:03 UTC (permalink / raw) To: Shujing Zhao; +Cc: Joseph S. Myers, GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 4524 bytes --] On 03/22/2010 04:41 PM, Shujing Zhao wrote: > On 03/19/2010 09:26 PM, Joseph S. Myers wrote: >> On Fri, 19 Mar 2010, Shujing Zhao wrote: >> >>>> Why this increment? Characters may have width 0 (combining >>>> characters) or >>>> more than 1 and it's not clear why adding 1 should be the correct >>>> thing to >>>> do here. >>> It is changed to *length--* at the new logic. >>> + colwidth += wcwidth (wc[0]); >>> + length--; >>> + if (length == 0) >>> + len = remaining; >>> I have tried to used length = length - nbytes to get the new length. >>> But if >>> nbytes > 1 and the character is the last character of the string >>> (length will >>> be 0 if it decreases nbytes), it would not assign the last variable >>> *len* to >>> *len_bk*, then the character that maybe put into the ROOM would not be >>> printed. That is to say, it presumes the nbytes is 1. If (nbytes > >>> 1), it >>> would be handled specially (just to update the value of len_bk). At >>> the other >>> conditionals, len_bk is always kept consistently with len. >> >> I'm afraid none of this makes any sense to me, because I have no idea >> what "length", "len" and "len_bk" are meant to be, and how they >> differ. You need clearer variable names. I think you also need >> comments at various points stating what the values various variables >> mean at those points, and maybe an overview comment about the >> algorithm used. It would be best to avoid ambiguous terms such as >> "length" at all points in variable names and comments, instead talking >> about numbers of bytes difference between two points, or similar. >> >> For example, the present algorithm in opts.c:wrap_help might be >> described thus: >> >> /* Look for a suitable point at which to break HELP. If there is one >> after at most ROOM columns, take the latest such point. Otherwise, >> take the first such point in the string, or the end of the string >> if it cannot be broken. */ >> >> If the new algorithm is still like that, you may just need to update >> variable names (and explain what the interface is for indicating both >> how much of the string to print, and how much (spaces) to skip before >> starting on the next line). >> >> The key variables to explain are LEN and I. At each iteration in the >> present loop, something like this might explain them. >> >> /* The best candidate point found so far to break HELP is after LEN >> bytes, where either LEN < I, or LEN == REMAINING if the string cannot >> be broken within the first I bytes. If we are outside ROOM columns >> and have found somewhere to break the string, that is the best >> point available; otherwise, breaking after I bytes will be better >> if that is a valid point to break. */ >> >> Now, things will be a bit more complicated in that you need to track >> the current column width as well as the current number of bytes of the >> string that take up that width. But I think the basic idea is still >> the same - it is just that "length", "len" and "len_bk" are not useful >> variable names to distinguish the different quantities being tracked >> and so make clear whether particular arithmetic you are doing on them >> makes logical sense. > Thanks. I changed the "len_bk" to "pre_len_to_print" as the number of > bytes that have been found can be printed at ROOM columns, "len" to > "cur_len_to_print" as the possible number of bytes that can be printed > at ROOM columns, "length" to left_len as the number of bytes that MSGSTR > left to convert to wide character. > Some comments are also added as your suggestion. > >> I remain very suspicious of anything modifying a byte counter by 1 in >> this context; if you are ever modifying a byte counter by something >> other than the number of bytes in the relevant multibyte character I >> think you are confused and need to sort out the logic of the code so >> you don't need to do so. > Thanks. You are right. I found the way to solve the problem that I have > mentioned when I explain why I have to use "length--". It just needs to > store the previous point that can be break the string before the new > point is set to the end of the string. > > + left_len -= nbytes; > + > + /* Compute the point to break the string. */ > + if (left_len == 0) > + { > + pre_len_to_print = cur_len_to_print; > + cur_len_to_print = remaining; > + } > > Is it Ok? > > Thanks > Pearly > Added the ChangeLog. Is it OK? [-- Attachment #2: ChangeLog --] [-- Type: text/plain, Size: 393 bytes --] 2010-03-31 Shujing Zhao <pearly.zhao@oracle.com> * intl.c (get_aligned_len): New function. * intl.h (get_aligned_len): New prototype. * opts.c (wrap_help): Remove the ITEM_WIDTH parameter. Print the option name and help text respectively and add spaces between them when needed. (print_filtered_help): Split the help string to option name and help text and adjust wrap_help caller. [-- Attachment #3: 0331.patch --] [-- Type: text/x-patch, Size: 8370 bytes --] Index: intl.h =================================================================== --- intl.h (revision 157720) +++ intl.h (working copy) @@ -60,6 +60,8 @@ extern const char *fake_ngettext(const c #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 157720) +++ intl.c (working copy) @@ -92,6 +92,7 @@ gcc_init_libintl (void) #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,99 @@ gcc_gettext_width (const char *msgstr) return wcswidth (wmsgstr, nwcs); } +/* Look for a suitable point at which to break string MSGSTR. If there is one + after at most ROOM columns, take the latest such point. Otherwise, + take the first such point in the string, or the end of the string if it + cannot be broken. + REMAINING is the number of bytes of MSGSTR. + Return the number of bytes before the point. */ +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + size_t nbytes; + wchar_t wc[1]; + size_t pre_nbytes; + wchar_t pre_wc; + + unsigned int colwidth = 0; + + /* The number of bytes that have been found can be printed at ROOM columns. */ + unsigned int pre_len_to_print; + + /* The possible number of bytes that can be printed at ROOM columns. */ + unsigned int cur_len_to_print = 0; + + /* The number of bytes that MSGSTR left to convert to wide character. */ + int left_len = remaining; + + mbstate_t state; + memset (&state, '\0', sizeof (state)); + + /* Convert one character at a time, keep track of the width at each point + and of the byte count to the last possible point for a line break. The + best candidate point found so far to break MSGSTR is after the space, + some puncts or multibyte character with the number of bytes + CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere + PRE_LEN_TO_PRINT to break the string, that is the best point available; + otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that + is a valid point to break. */ + while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + /* Invalide input string. */ + break; + + colwidth += wcwidth (wc[0]); + left_len -= nbytes; + + /* Compute the point to break the string. */ + if (left_len == 0) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining; + } + else if (iswspace (wc[0])) + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes; + else if ((wc[0] == L'-' || wc[0] == L'/') + && iswalpha (pre_wc)) + pre_len_to_print = cur_len_to_print = remaining - left_len; + else if (nbytes > 1) + { + pre_len_to_print = cur_len_to_print; + cur_len_to_print = remaining - left_len; + pre_wc = wc[0]; + pre_nbytes = nbytes; + /* Keep the puncts printed with the previous word. */ + if ((nbytes = mbrtowc (wc, msgstr + nbytes, left_len, &state)) > 0) + { + if (nbytes >= (size_t) -2) + break; + if (iswpunct (wc[0]) && iswalpha (pre_wc)) + { + cur_len_to_print = remaining - left_len + nbytes; + colwidth += wcwidth (wc[0]); + msgstr += nbytes; + left_len -= nbytes; + } + nbytes = pre_nbytes; + } + } + + if (colwidth >= room) + { + /* If the string is outside ROOM columns after counted the last + word, the break point will be set to the previous point it kept. */ + if (colwidth > room) + cur_len_to_print = pre_len_to_print; + break; + } + msgstr += nbytes; + pre_wc = wc[0]; + } + return cur_len_to_print; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,9 +213,7 @@ gcc_gettext_width (const char *msgstr) #endif -#endif /* ENABLE_NLS */ - -#ifndef ENABLE_NLS +#else /* Not defined ENABLE_NLS */ const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) @@ -132,6 +224,35 @@ fake_ngettext (const char *singular, con return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 157720) +++ opts.c (working copy) @@ -1149,45 +1149,45 @@ decode_options (unsigned int argc, const #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, - followed by word-wrapped HELP in a second column. */ +/* Output ITEM in the left column and HELP in the second column with the total + COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the + corresponding number of spaces for alignment. If the ROOM is less than + REMAINING, look for a suitable point at which to break HELP, Otherwise, take + take the end of the string. + If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in + the left column from the secode line. */ static void wrap_help (const char *help, const char *item, - unsigned int item_width, unsigned int columns) { - unsigned int col_width = LEFT_COLUMN; unsigned int remaining, room, len; + unsigned int col_width = LEFT_COLUMN; + unsigned int item_width = gcc_gettext_width (item); remaining = strlen (help); do { + int n_spaces = col_width - item_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; - - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } + len = get_aligned_len (help, room, remaining); - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1226,7 +1226,7 @@ print_filtered_help (unsigned int includ /* Get the translation. */ help = _(help); - wrap_help (help, param, strlen (param), columns); + wrap_help (help, param, columns); } putchar ('\n'); return; @@ -1242,6 +1242,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1278,7 +1279,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else @@ -1319,7 +1323,7 @@ print_filtered_help (unsigned int includ help = new_help; } - wrap_help (help, opt, len, columns); + wrap_help (help, opt, columns); displayed = true; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-03-31 7:03 ` Shujing Zhao @ 2010-04-06 16:56 ` Joseph S. Myers 2010-04-13 11:32 ` Shujing Zhao 0 siblings, 1 reply; 20+ messages in thread From: Joseph S. Myers @ 2010-04-06 16:56 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Wed, 31 Mar 2010, Shujing Zhao wrote: > + unsigned int colwidth = 0; No comment on this variable. > + /* The number of bytes that have been found can be printed at ROOM columns. */ > + unsigned int pre_len_to_print; The semantics of this variable seem confusing to me. There isn't a clear difference between "have been found can be printed" and "can be printed". > + /* The possible number of bytes that can be printed at ROOM columns. */ > + unsigned int cur_len_to_print = 0; I think you mean something like "The number of bytes that can be printed at the best possible line-breaking position found so far, or 0 if no suitable break has been found.". Or is that pre_len_to_print? > + /* The number of bytes that MSGSTR left to convert to wide character. */ Conversion to wide characters is part of the implementation of this loop, not its interface. "The number of bytes of MSGSTR not yet examined for possible line breaks."? > + /* Convert one character at a time, keep track of the width at each point > + and of the byte count to the last possible point for a line break. The > + best candidate point found so far to break MSGSTR is after the space, > + some puncts or multibyte character with the number of bytes > + CUR_LEN_TO_PRINT. If we are outside ROOM columns and have found somewhere Do you mean "is after CUR_LEN_TO_PRINT bytes of the string"? Saying "with the number of bytes" sounds like counting the bytes in a single character, rather than being a position within the string. > + PRE_LEN_TO_PRINT to break the string, that is the best point available; Previously you said CUR_LEN_TO_PRINT is the best point so far. Now you're saying PRE_LEN_TO_PRINT is best. Which is? > + otherwise, breaking after CUR_LEN_TO_PRINT bytes will be better if that > + is a valid point to break. */ Why wouldn't it be valid, if it was previously the best valid point? Perhaps you need to distinguish clearly whether a variable's value represents what is known to be a valid point for breaking the string, or might be one subject to further tests. "cur_" and "pre_" do not seem adequate to represent that distinction. The variable for a value still subject to further tests (if needed at all) should be local to a single iteration of the loop. > + while ((nbytes = mbrtowc (wc, msgstr, left_len, &state)) > 0) > + { > + if (nbytes >= (size_t) -2) > + /* Invalide input string. */ "Invalid". > + /* Compute the point to break the string. */ > + if (left_len == 0) > + { > + pre_len_to_print = cur_len_to_print; > + cur_len_to_print = remaining; I think it might be clearer to have the loop in the form while (left_len > 0) which at least would avoid questions of what mbrtowc is meant to do when given a length of 0. After the loop, if cur_len_to_print is 0 or the entire string would fit in the given number of columns, set cur_len_to_print accordingly to indicate that the whole string should be printed. Then make the loop mimic that in the non-i18n version of the function as much as possible: if the previous columns occupy at least the full width and a point to break has been found, then break; otherwise decode a character; if a space then breaking *before* the space is OK; if '-' or '/' (and the other conditions apply) then breaking *after* that character is OK. Incidentally, it would be a lot clearer if you had a single variable for the byte index into the string (like the non-i18n version) instead of both increasing msgstr and decreasing left_len - so the loop would be "while (i < remaining)". You wouldn't need to do "remaining - left_len" computations that way. > + } > + else if (iswspace (wc[0])) > + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes; > + else if ((wc[0] == L'-' || wc[0] == L'/') > + && iswalpha (pre_wc)) > + pre_len_to_print = cur_len_to_print = remaining - left_len; Will pre_wc always be initialized here, if the string starts with '-' or '/'? > + else if (nbytes > 1) I still see no reason you should need conditionals on the number of bytes in a character, instead of always working with characters regardless of the number of bytes in them. > + { > + pre_len_to_print = cur_len_to_print; > + cur_len_to_print = remaining - left_len; > + pre_wc = wc[0]; > + pre_nbytes = nbytes; > + /* Keep the puncts printed with the previous word. */ I don't understand what issue you are dealing with here, but punctuation other than '/' or '-' isn't considered a valid line-breaking point anyway so you shouldn't need special code for it. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-04-06 16:56 ` Joseph S. Myers @ 2010-04-13 11:32 ` Shujing Zhao 2010-04-15 21:54 ` Joseph S. Myers 0 siblings, 1 reply; 20+ messages in thread From: Shujing Zhao @ 2010-04-13 11:32 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Patches, Paolo Carlini [-- Attachment #1: Type: text/plain, Size: 4393 bytes --] On 04/07/2010 12:55 AM, Joseph S. Myers wrote: Thanks. Fix the typo and all the comments. Change the confused variable cur_len_to_print to len_to_print. >> + /* Compute the point to break the string. */ >> + if (left_len == 0) >> + { >> + pre_len_to_print = cur_len_to_print; >> + cur_len_to_print = remaining; > > I think it might be clearer to have the loop in the form > > while (left_len > 0) > > which at least would avoid questions of what mbrtowc is meant to do when > given a length of 0. After the loop, if cur_len_to_print is 0 or the > entire string would fit in the given number of columns, set > cur_len_to_print accordingly to indicate that the whole string should be > printed. Then make the loop mimic that in the non-i18n version of the > function as much as possible: if the previous columns occupy at least the > full width and a point to break has been found, then break; otherwise > decode a character; if a space then breaking *before* the space is OK; if > '-' or '/' (and the other conditions apply) then breaking *after* that > character is OK. > > Incidentally, it would be a lot clearer if you had a single variable for > the byte index into the string (like the non-i18n version) instead of both > increasing msgstr and decreasing left_len - so the loop would be "while (i > < remaining)". You wouldn't need to do "remaining - left_len" > computations that way. Both pre_len_to_print and len_to_print are initialized REMAINING now. If no suitable line-breaking position has been found, the whole line will be printed. a variable i is added to index the string to decode and use while (i < remaining) to control the loop. It looks clear. Thanks. > >> + } >> + else if (iswspace (wc[0])) >> + pre_len_to_print = cur_len_to_print = remaining - left_len - nbytes; >> + else if ((wc[0] == L'-' || wc[0] == L'/') >> + && iswalpha (pre_wc)) >> + pre_len_to_print = cur_len_to_print = remaining - left_len; > > Will pre_wc always be initialized here, if the string starts with '-' or > '/'? It is changed to + else if ((wc[0] == L'-' || wc[0] == L'/') + && i > nbytes + && iswalpha (pre_wc)) (i > nbytes) would ensure the pre_wc is initialized. I found this solution from the non-i18n version. > >> + else if (nbytes > 1) > > I still see no reason you should need conditionals on the number of bytes > in a character, instead of always working with characters regardless of > the number of bytes in them. This condition is to make the string can be break after a multi bytes wide alphabetic or punctuations. The line can be break after every multi bytes wide alphabetic or punctuations. I think the nbytes can distinguish if it is a wide character before decode. The letter 'a' is recognized wide character after the decoding, but the line can't be broken after a 'a' at the string "after". I think the difference between one byte wide character and the multi bytes wide character is the nbytes. Is it better changed to + else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1) better? > >> + { >> + pre_len_to_print = cur_len_to_print; >> + cur_len_to_print = remaining - left_len; >> + pre_wc = wc[0]; >> + pre_nbytes = nbytes; >> + /* Keep the puncts printed with the previous word. */ > > I don't understand what issue you are dealing with here, but punctuation > other than '/' or '-' isn't considered a valid line-breaking point anyway > so you shouldn't need special code for it. > Since the line can be broken after a multi bytes alphabetic, the following punctuation maybe be printed as the first character at the new line if the punctuation didn't be dealing with. I also found at some special cases, only a "full stop" printed at a new line. It looks strange. So I design to print these punctuation with the previous alphabetic. That is to say, if a wide character followed by a wide character punctuation, the possible line-breaking position would not be set after this wide character. It would be set after the punctuation. yeah, the '/' and '-' must be considered. How about used (nbytes > 1) again to distinguish the multi bytes wide character punctuation? It at least ensure the Chinese multi bytes punctuations are handled. The other punctuation will be handle just like the non-i18n version. Thanks Pearly [-- Attachment #2: 0413.patch --] [-- Type: text/x-patch, Size: 8546 bytes --] Index: intl.h =================================================================== --- intl.h (revision 158214) +++ intl.h (working copy) @@ -61,6 +61,8 @@ extern const char *fake_ngettext(const c #endif extern char *get_spaces (const char *); +extern unsigned int get_aligned_len (const char *, unsigned int, + unsigned int); extern const char *open_quote; extern const char *close_quote; Index: intl.c =================================================================== --- intl.c (revision 158214) +++ intl.c (working copy) @@ -92,6 +92,7 @@ gcc_init_libintl (void) #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH #include <wchar.h> +#include <wctype.h> /* Returns the width in columns of MSGSTR, which came from gettext. This is for indenting subsequent output. */ @@ -106,6 +107,103 @@ gcc_gettext_width (const char *msgstr) return wcswidth (wmsgstr, nwcs); } +/* Look for a suitable point at which to break string MSGSTR. If there is one + after at most ROOM columns, take the latest such point. Otherwise, + take the first such point in the string, or the end of the string if it + cannot be broken. + REMAINING is the number of bytes of MSGSTR. + Return the number of bytes before the point. */ +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + size_t nbytes; + wchar_t wc[1]; + wchar_t pre_wc; + + /* The byte index of the string when convert to a wide character. */ + unsigned int i = 0; + + /* The number of column positions for the printed wide-character string. */ + unsigned int colwidth = 0; + + /* The number of bytes that may be printed at the possible line-breaking + position found so far, or REMAINING if no suitable break has been found. */ + unsigned int len_to_print = remaining; + + /* The previous number of bytes that can be printed at the possible + line-breaking position found so far, or REMAINING if no suitable + break has been found. */ + unsigned int pre_len_to_print = remaining; + + mbstate_t state; + memset (&state, '\0', sizeof (state)); + + /* Convert one character at a time, keep track of the width at each point + and of the byte count to the last possible point for a line break. The + best candidate point found so far to break MSGSTR is after the space, + '-', '/', the multi bytes puncts or the multi bytes alphabetic. + If we are outside ROOM columns after LEN_TO_PRINT and have found + somewhere PRE_LEN_TO_PRINT to break the string, PRE_LEN_TO_PRINT is the + best point available; otherwise, the string will be break after + LEN_TO_PRINT. */ + do + { + nbytes = mbrtowc (wc, msgstr, remaining - i, &state); + if (nbytes >= (size_t) -2) + /* Invalid input string. */ + break; + + i += nbytes; + colwidth += wcwidth (wc[0]); + + /* Compute the point to break the string. */ + if (i == remaining) + { + pre_len_to_print = len_to_print; + len_to_print = remaining; + } + else if (iswspace (wc[0])) + pre_len_to_print = len_to_print = i - nbytes; + else if ((wc[0] == L'-' || wc[0] == L'/') + && i > nbytes + && iswalpha (pre_wc)) + pre_len_to_print = len_to_print = i; + else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1) + { + size_t nbytes_next; + pre_len_to_print = len_to_print; + len_to_print = i; + pre_wc = wc[0]; + nbytes_next = mbrtowc (wc, msgstr + nbytes, remaining - i, &state); + if (nbytes_next >= (size_t) -2) + break; + /* Keep the multi bytes puncts printed with the previous word. */ + if (iswpunct (wc[0]) && iswalpha (pre_wc) && nbytes_next > 1) + { + i += nbytes_next; + colwidth += wcwidth (wc[0]); + len_to_print = i; + msgstr += nbytes_next; + } + } + + if (colwidth >= room) + { + /* If the string is outside ROOM columns after the possible + LEN_TO_PRINT, the break point will be set to the previous point. */ + if (colwidth > room) + len_to_print = pre_len_to_print; + break; + } + + msgstr += nbytes; + pre_wc = wc[0]; + } + while (i < remaining); + return len_to_print; +} + #else /* no wcswidth */ /* We don't have any way of knowing how wide the string is. Guess @@ -119,9 +217,7 @@ gcc_gettext_width (const char *msgstr) #endif -#endif /* ENABLE_NLS */ - -#ifndef ENABLE_NLS +#else /* Not defined ENABLE_NLS */ const char * fake_ngettext (const char *singular, const char *plural, unsigned long n) @@ -132,6 +228,35 @@ fake_ngettext (const char *singular, con return plural; } +#endif /* ENABLE_NLS */ + +#if !(defined ENABLE_NLS) || !(defined HAVE_WCHAR_H \ + && defined HAVE_WORKING_MBSTOWCS \ + && defined HAVE_WCSWIDTH) + +/* Return the length of MSGSTR that can be printed within the width ROOM. + REMAINING is the length of string MSGSTR counted by bytes. */ + +unsigned int +get_aligned_len (const char *msgstr, unsigned int room, + unsigned int remaining) +{ + unsigned int i; + unsigned int len = remaining; + for (i = 0; msgstr[i]; i++) + { + if (i >= room && len != remaining) + break; + if (msgstr[i] == ' ') + len = i; + else if ((msgstr[i] == '-' || msgstr[i] == '/') + && msgstr[i + 1] != ' ' + && i > 0 && ISALPHA (msgstr[i - 1])) + len = i + 1; + } + return len; +} + #endif /* Return the indent for successive lines, using the width of Index: opts.c =================================================================== --- opts.c (revision 158214) +++ opts.c (working copy) @@ -1143,45 +1143,45 @@ decode_options (unsigned int argc, const #define LEFT_COLUMN 27 -/* Output ITEM, of length ITEM_WIDTH, in the left column, - followed by word-wrapped HELP in a second column. */ +/* Output ITEM in the left column and HELP in the second column with the total + COLUMNS columns. If the ITEM_WIDTH is less than LEFT_COLUMN, print the + corresponding number of spaces for alignment. If the ROOM is less than + REMAINING, look for a suitable point at which to break HELP, Otherwise, take + take the end of the string. + If the HELP is broken to more than one lines, ouput LEFT_COLUMN spaces in + the left column from the secode line. */ static void wrap_help (const char *help, const char *item, - unsigned int item_width, unsigned int columns) { - unsigned int col_width = LEFT_COLUMN; unsigned int remaining, room, len; + unsigned int col_width = LEFT_COLUMN; + unsigned int item_width = gcc_gettext_width (item); remaining = strlen (help); do { + int n_spaces = col_width - item_width; + printf (" %s", item); + if (n_spaces > 0) + { + const char *space = " "; + printf ("%*s", n_spaces, space); + } + room = columns - 3 - MAX (col_width, item_width); if (room > columns) room = 0; len = remaining; if (room < len) - { - unsigned int i; - - for (i = 0; help[i]; i++) - { - if (i >= room && len != remaining) - break; - if (help[i] == ' ') - len = i; - else if ((help[i] == '-' || help[i] == '/') - && help[i + 1] != ' ' - && i > 0 && ISALPHA (help[i - 1])) - len = i + 1; - } - } + len = get_aligned_len (help, room, remaining); - printf( " %-*.*s %.*s\n", col_width, item_width, item, len, help); - item_width = 0; + printf ( " %.*s\n", len, help); + item = " "; + item_width = 1; while (help[len] == ' ') len++; help += len; @@ -1220,7 +1220,7 @@ print_filtered_help (unsigned int includ /* Get the translation. */ help = _(help); - wrap_help (help, param, strlen (param), columns); + wrap_help (help, param, columns); } putchar ('\n'); return; @@ -1236,6 +1236,7 @@ print_filtered_help (unsigned int includ unsigned int len; const char *opt; const char *tab; + char *buf; if (include_flags == 0 || ((option->flags & include_flags) != include_flags)) @@ -1272,7 +1273,10 @@ print_filtered_help (unsigned int includ if (tab) { len = tab - help; - opt = help; + buf = (char *) alloca (len + 1); + strncpy (buf, help, len); + buf[len] = '\0'; + opt = buf; help = tab + 1; } else @@ -1313,7 +1317,7 @@ print_filtered_help (unsigned int includ help = new_help; } - wrap_help (help, opt, len, columns); + wrap_help (help, opt, columns); displayed = true; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH PR/42686] Align the help text output 2010-04-13 11:32 ` Shujing Zhao @ 2010-04-15 21:54 ` Joseph S. Myers 0 siblings, 0 replies; 20+ messages in thread From: Joseph S. Myers @ 2010-04-15 21:54 UTC (permalink / raw) To: Shujing Zhao; +Cc: GCC Patches, Paolo Carlini On Tue, 13 Apr 2010, Shujing Zhao wrote: > > > + else if (nbytes > 1) > > > > I still see no reason you should need conditionals on the number of bytes in > > a character, instead of always working with characters regardless of the > > number of bytes in them. > This condition is to make the string can be break after a multi bytes wide > alphabetic or punctuations. The line can be break after every multi bytes wide > alphabetic or punctuations. I think the nbytes can distinguish if it is a wide > character before decode. The letter 'a' is recognized wide character after the > decoding, but the line can't be broken after a 'a' at the string "after". I > think the difference between one byte wide character and the multi bytes wide > character is the nbytes. > > Is it better changed to > > + else if ((iswalpha (wc[0]) || iswpunct (wc[0])) && nbytes > 1) > > better? I still cannot make sense of what the logical condition is that this code is trying to implement; that is, the logical properties of a pair of characters and what the conclusion is from those properties about whether a break is or is not permitted at a particular location in relation to those characters. You have a series of conditions that might be described something like (and it's possible I'm not understanding your intent): /* We can break at the end of the string if it is narrow enough. */ /* If there is a space, we can break before the space (and not print the space). */ /* Break after '-' or '/' if the previous character was alphabetical, but not in the middle of "--" (for example). */ But what in plain English is the rule this code is trying to implement for a condition on breaking or not breaking the string? Whatever the condition is, it relates to some logical properties of the characters in question. The number of bytes is not a logical property; it's a physical property of the particular locale character set and may vary from character set to character set (gettext will automatically translate using iconv to the LC_CTYPE character set). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-04-15 21:51 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-02-25 10:32 [PATCH PR/42686] Align the help text output Shujing Zhao 2010-02-25 16:34 ` Joseph S. Myers 2010-03-10 10:26 ` Shujing Zhao 2010-03-10 17:19 ` Joseph S. Myers 2010-03-12 11:34 ` Shujing Zhao 2010-03-12 17:09 ` Joseph S. Myers 2010-03-15 9:28 ` Shujing Zhao 2010-03-15 10:58 ` Shujing Zhao 2010-03-15 11:42 ` Joseph S. Myers 2010-03-17 7:57 ` Shujing Zhao 2010-03-17 12:50 ` Joseph S. Myers 2010-03-18 11:48 ` Shujing Zhao 2010-03-18 17:54 ` Joseph S. Myers 2010-03-19 11:43 ` Shujing Zhao 2010-03-19 13:44 ` Joseph S. Myers 2010-03-22 9:17 ` Shujing Zhao 2010-03-31 7:03 ` Shujing Zhao 2010-04-06 16:56 ` Joseph S. Myers 2010-04-13 11:32 ` Shujing Zhao 2010-04-15 21:54 ` Joseph S. Myers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).