public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: papa@arbolone.ca, Riot <rain.backnet@gmail.com>,
	 mingw-w64-public@lists.sourceforge.net
Cc: gcc-help Mailing List <gcc-help@gcc.gnu.org>
Subject: Re: [Mingw-w64-public] toUpper()
Date: Wed, 01 Jul 2015 15:18:00 -0000	[thread overview]
Message-ID: <559404A3.30207@gmail.com> (raw)
In-Reply-To: <309695B4BE814C0887F18FB11845328E@ArbolOneLT>

[-- Attachment #1: Type: text/plain, Size: 4523 bytes --]

On 07/01/2015 06:02 AM, papa@arbolone.ca wrote:
> std::wstring source(L"Hello World");
> std::wstring destination;
> destination.resize(source.size());
> std::transform (source.begin(), source.end(), destination.begin(),
> (int(*)(int))std::toupper);
>
> The above code is what did the trick, do not ask how, I am still
> digesting it. However, any suggestions would be very much appreciated

This solved problem (1) below but doesn't work correctly or
portably because of the second problem I described in my first
response. std::toupper(int) is defined for narrow characters in
the range [0, UCHAR_MAX] plus EOF. The function has undefined
behavior for characters outside that range (i.e., all wchar_t
greater than UCHAR_MAX).

I don't know what will happen on Windows(*) but on Linux, I can
see the program doesn't work correctly for the Latin Extended
Additional block of characters (the first one I noticed). For
instance, running the attached modified version of the program
in a UTF-8 locale such as en_US.utf8 to convert U+1EBD (LATIN
SMALL LETTER E WITH TILDE) to its uppercase form (U+1EBC)
prints:

     U+1EBD  U+1EBC  U+1EBD

when the expected output is:

     U+1EBD  U+1EBC  U+1EBC

If you want to use transform with wide characters, you need
to use towupper (declared in <wctype.h>).

Martin

[*] I vaguely recall toupper and friends aborting on Windows
when passed an out-of-range argument but I'm not 100% sure.

>
> -----Original Message----- From: Martin Sebor
> Sent: Tuesday, June 30, 2015 10:01 PM
> To: Riot ; mingw-w64-public@lists.sourceforge.net
> Cc: gcc-help Mailing List
> Subject: Re: [Mingw-w64-public] toUpper()
>
> On 06/30/2015 05:24 PM, Riot wrote:
>>      #include <algorithm>
>>      #include <string>
>>
>>      std::string str = "Hello World";
>>      std::transform(str.begin(), str.end(), str.begin(), std::toupper);
>
> Please note this code is subtly incorrect for two reasons.
> There are two overloads of std::toupper:
>
> 1) int toupper(int) declared in <ctype.h> (and the equivalent
>     std::toupper in <cctype>)
> 2) template <class T> charT std::toupper(T, const locale&)
>     in <locale>
>
> Without the right #include directive, the above may or may
> not resolve to "the right" function (which depends on what
> declarations the two headers bring into scope).
>
> When it resolves to (2) it will fail to compile.
>
> When it resolves to (1), it will do the wrong thing (have
> undefined behavior) at runtime when char is a signed type
> and the argument is negative (because (1) is only defined
> for values between -1 and UCHAR_MAX).
>
> But the question is about converting std::wstring to upper
> case and the above uses a narrow string. For wstring, the
> std::ctype<wchar_t>::toupper() function or its convenience
> non-member template function can be used.
>
>> See also: http://www.cplusplus.com/reference/locale/toupper/
>
> This is one possible way to do it. Another approach is along
> these lines:
>
>     std::locale loc (...);
>     std::wstring wstr = L"...";
>     const std::ctype<wchar_t> &ct =
>         std::use_facet<std::ctype<wchar_t> >(loc);
>     ct.toupper (&wstr[0], &wstr[0] + wstr.size());
>
> Martin
>
>>
>> This may also help in future: http://lmgtfy.com/?q=c%2B%2B+toupper
>>
>> -Riot
>>
>> On 30 June 2015 at 23:58,  <papa@arbolone.ca> wrote:
>>> I would like to write a function to capitalize letters, say...
>>> std::wstring toUpper(const std::wstring wstr){
>>> for ( auto it = wstr.begin(); it != wstr.end(); ++it){
>>>          global_wapstr.append(std::towupper(&it));
>>>
>>> }
>>> }
>>>
>>> This doesnÂ’t work, but doesnÂ’t the standard already have something like
>>> std::wstring::toUpper(...)?
>>>
>>> Thanks in advance
>>>
>>>
>>> ---
>>> This email has been checked for viruses by Avast antivirus software.
>>> http://www.avast.com
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Don't Limit Your Business. Reach for the Cloud.
>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>> you need to offload your IT needs and focus on growing your business.
>>> Configured For All Businesses. Start Your Cloud Today.
>>> https://www.gigenetcloud.com/
>>> _______________________________________________
>>> Mingw-w64-public mailing list
>>> Mingw-w64-public@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> http://www.avast.com
>


[-- Attachment #2: u.cpp --]
[-- Type: text/x-c++src, Size: 608 bytes --]

#include <algorithm>
#include <cctype>
#include <cwctype>
#include <clocale>
#include <stdio.h>
#include <string>

int main ()
{
    if (!std::setlocale (LC_ALL, ""))
        return 1;

    // convert LATIN SMALL LETTER E WITH TILDE (U+1EBD)
    // to LATIN CAPITAL LETTER E WITH TILDE (U+1EBC)
    std::wstring source(L"\x1ebd");
    std::wstring destination;
    destination.resize(source.size());
    std::transform (source.begin(), source.end(), destination.begin(), (int(*)(int))std::toupper);

    printf ("U+%04X  U+%04X  U+%04X\n",
            source [0],  towupper (source [0]), destination [0]);
}

  reply	other threads:[~2015-07-01 15:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 22:58 toUpper() papa
2015-06-30 23:24 ` [Mingw-w64-public] toUpper() Riot
2015-07-01  2:01   ` Martin Sebor
2015-07-01 12:02     ` papa
2015-07-01 15:18       ` Martin Sebor [this message]
2015-07-01 18:01         ` papa
2015-07-01 10:09 ` toUpper() Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559404A3.30207@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-help@gcc.gnu.org \
    --cc=mingw-w64-public@lists.sourceforge.net \
    --cc=papa@arbolone.ca \
    --cc=rain.backnet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).