public inbox for cygwin-cvs@sourceware.org
help / color / mirror / Atom feed
* [newlib-cygwin/main] Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value
@ 2023-02-14 12:09 Corinna Vinschen
0 siblings, 0 replies; only message in thread
From: Corinna Vinschen @ 2023-02-14 12:09 UTC (permalink / raw)
To: cygwin-cvs
https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=60c25da90d015f27c5697c6db7ab0557585d09aa
commit 60c25da90d015f27c5697c6db7ab0557585d09aa
Author: Corinna Vinschen <corinna@vinschen.de>
AuthorDate: Tue Feb 14 12:20:20 2023 +0100
Commit: Corinna Vinschen <corinna@vinschen.de>
CommitDate: Tue Feb 14 12:20:20 2023 +0100
Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value
Given how UTF-16 isn't capable to hold all Unicode chars in a single
wchar_t, we need a function returning a wint_t value representing
a UTF-32 value for comparison functions. Fortunately the important
wide character functions like towupper/towlower, isw<class>, iswctype,
etc, already take wint_t values and newlib handles them as UTF-32.
If only we had switched wchar_t to 32 bit way back when... sigh.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Diff:
---
winsup/cygwin/local_includes/wchar.h | 4 ++++
winsup/cygwin/strfuncs.cc | 32 ++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/winsup/cygwin/local_includes/wchar.h b/winsup/cygwin/local_includes/wchar.h
index b2ddd457568f..3d746c29b9bf 100644
--- a/winsup/cygwin/local_includes/wchar.h
+++ b/winsup/cygwin/local_includes/wchar.h
@@ -39,6 +39,10 @@ extern wctomb_f __utf8_wctomb;
#define __WCTOMB (__get_current_locale ()->wctomb)
+/* replacement function for mbrtowc, returning a wint_t representing
+ a UTF-32 value. Defined in strfuncs.cc */
+extern wint_t mbrtowi (wint_t *, const char *, size_t, mbstate_t *);
+
#ifdef __cplusplus
}
#endif
diff --git a/winsup/cygwin/strfuncs.cc b/winsup/cygwin/strfuncs.cc
index 0ab2290539a8..0b9d8ac1f639 100644
--- a/winsup/cygwin/strfuncs.cc
+++ b/winsup/cygwin/strfuncs.cc
@@ -112,6 +112,38 @@ transform_chars_af_unix (PWCHAR out, const char *path, __socklen_t len)
return out;
}
+/* replacement function for mbrtowc, returning a wint_t representing
+ a UTF-32 value. */
+extern "C" wint_t
+mbrtowi (wint_t *pwi, const char *s, size_t n, mbstate_t *ps)
+{
+ size_t len, len2;
+ wchar_t w1, w2;
+
+ len = mbrtowc (&w1, s, n, ps);
+ if (len == (size_t) -1 || len == (size_t) -2)
+ return len;
+ *pwi = w1;
+ /* Convert surrogate pair to wint_t value */
+ if (len > 0 && w1 >= 0xd800 && w1 <= 0xdbff)
+ {
+ s += len;
+ n -= len;
+ len2 = mbrtowc (&w2, s, n, ps);
+ if (len2 > 0 && w2 >= 0xdc00 && w2 <= 0xdfff)
+ {
+ len += len2;
+ *pwi = (((w1 & 0x3ff) << 10) | (w2 & 0x3ff)) + 0x10000;
+ }
+ else
+ {
+ len = (size_t) -1;
+ errno = EILSEQ;
+ }
+ }
+ return len;
+}
+
/* The SJIS, JIS and eucJP conversion in newlib does not use UTF as
wchar_t character representation. That's unfortunate for us since
we require UTF for the OS. What we do here is to have our own
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-02-14 12:09 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14 12:09 [newlib-cygwin/main] Cygwin: mbrtowi: define replacement for mbrtowc, returning UTF-32 value Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).