public inbox for newlib-cvs@sourceware.org
help / color / mirror / Atom feed
* [newlib-cygwin] Locale modifier @cjkwide to adjust ambiguous-width in non-CJK locales
@ 2018-03-05 16:25 Corinna Vinschen
  0 siblings, 0 replies; only message in thread
From: Corinna Vinschen @ 2018-03-05 16:25 UTC (permalink / raw)
  To: newlib-cvs

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=f92f048528e6f74a0a0d11e897e536080cc012e3

commit f92f048528e6f74a0a0d11e897e536080cc012e3
Author: Thomas Wolff <towo@towo.net>
Date:   Fri Mar 2 20:21:09 2018 +0100

    Locale modifier @cjkwide to adjust ambiguous-width in non-CJK locales
    
    Locale modifier @cjkwide makes Unicode "ambiguous width" characters
    wide.  So ambiguous width characters can be enforced to have width 2
    even in non-CJK locales. This gives e.g. users of "Powerline symbols"
    the opportunity to adjust their width to the desired behaviour (and the
    behaviour apparently expected by some tools) without having to set a CJK
    locale and without losing consistence of terminal character width with
    wcwidth/wcswidth locale width.

Diff:
---
 newlib/libc/locale/locale.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c
index baa5451..557982d 100644
--- a/newlib/libc/locale/locale.c
+++ b/newlib/libc/locale/locale.c
@@ -74,15 +74,16 @@ Cygwin additionally supports locales from the file
 (<<"">> is also accepted; if given, the settings are read from the
 corresponding LC_* environment variables and $LANG according to POSIX rules.)
 
-This implementation also supports the modifier <<"cjknarrow">>, which
-affects how the functions <<wcwidth>> and <<wcswidth>> handle characters
-from the "CJK Ambiguous Width" category of characters described at
-http://www.unicode.org/reports/tr11/#Ambiguous. These characters have a width
-of 1 for singlebyte charsets and a width of 2 for multibyte charsets
-other than UTF-8. For UTF-8, their width depends on the language specifier:
+This implementation also supports the modifiers <<"cjknarrow">> and
+<<"cjkwide">>, which affect how the functions <<wcwidth>> and <<wcswidth>>
+handle characters from the "CJK Ambiguous Width" category of characters
+described at http://www.unicode.org/reports/tr11/#Ambiguous.
+These characters have a width of 1 for singlebyte charsets and a width of 2
+for multibyte charsets other than UTF-8.
+For UTF-8, their width depends on the language specifier:
 it is 2 for <<"zh">> (Chinese), <<"ja">> (Japanese), and <<"ko">> (Korean),
-and 1 for everything else. Specifying <<"cjknarrow">> forces a width of 1,
-independent of charset and language.
+and 1 for everything else. Specifying <<"cjknarrow">> or <<"cjkwide">>
+forces a width of 1 or 2, respectively, independent of charset and language.
 
 If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns a
 pointer to the string representing the current locale.  The acceptable
@@ -480,6 +481,7 @@ __loadlocale (struct __locale_t *loc, int category, const char *new_locale)
   wctomb_p l_wctomb;
   mbtowc_p l_mbtowc;
   int cjknarrow = 0;
+  int cjkwide = 0;
 
   /* Avoid doing everything twice if nothing has changed.
 
@@ -593,11 +595,13 @@ restart:
   if (c && c[0] == '@')
     {
       /* Modifier */
-      /* Only one modifier is recognized right now.  "cjknarrow" is used
-         to modify the behaviour of wcwidth() for East Asian languages.
+      /* Modifiers "cjknarrow" or "cjkwide" are recognized to modify the
+         behaviour of wcwidth() and wcswidth() for East Asian languages.
          For details see the comment at the end of this function. */
       if (!strcmp (c + 1, "cjknarrow"))
 	cjknarrow = 1;
+      else if (!strcmp (c + 1, "cjkwide"))
+	cjkwide = 1;
     }
   /* We only support this subset of charsets. */
   switch (charset[0])
@@ -894,12 +898,15 @@ restart:
          single-byte charsets, and double width for multi-byte charsets
          other than UTF-8. For UTF-8, use double width for the East Asian
          languages ("ja", "ko", "zh"), and single width for everything else.
-         Single width can also be forced with the "@cjknarrow" modifier. */
-      loc->cjk_lang = !cjknarrow && mbc_max > 1
-		      && (charset[0] != 'U'
-			  || strncmp (locale, "ja", 2) == 0
-			  || strncmp (locale, "ko", 2) == 0
-			  || strncmp (locale, "zh", 2) == 0);
+         Single width can also be forced with the "@cjknarrow" modifier.
+         Double width can also be forced with the "@cjkwide" modifier.
+       */
+      loc->cjk_lang = cjkwide ||
+		      (!cjknarrow && mbc_max > 1
+		       && (charset[0] != 'U'
+			   || strncmp (locale, "ja", 2) == 0
+			   || strncmp (locale, "ko", 2) == 0
+			   || strncmp (locale, "zh", 2) == 0));
 #ifdef __HAVE_LOCALE_INFO__
       ret = __ctype_load_locale (loc, locale, (void *) l_wctomb, charset,
 				 mbc_max);


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2018-03-05 16:25 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 16:25 [newlib-cygwin] Locale modifier @cjkwide to adjust ambiguous-width in non-CJK locales Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).