public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs
@ 2023-11-21 15:04 ro at gcc dot gnu.org
  2023-11-21 15:05 ` [Bug c++/112652] " ro at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: ro at gcc dot gnu.org @ 2023-11-21 15:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

            Bug ID: 112652
           Summary: g++.dg/cpp26/literals2.C FAILs
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ro at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---
              Host: *-*-solaris2.11
            Target: *-*-solaris2.11
             Build: *-*-solaris2.11

The g++.dg/cpp26/literals2.C test FAILs on Solaris, both SPARC and x86:

+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 10)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 11)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 13)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 41)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 42)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++14  (test for errors, line 44)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 10)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 11)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 13)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 41)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 42)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++17  (test for errors, line 44)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 10)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 11)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 13)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 41)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 42)
+FAIL: g++.dg/cpp26/literals2.C  -std=gnu++20  (test for errors, line 44)

I initially thought that were due to make check being run with LANG=C, but I
get the same errors with LANG=en_US.UTF-8.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
@ 2023-11-21 15:05 ` ro at gcc dot gnu.org
  2023-11-21 18:32 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ro at gcc dot gnu.org @ 2023-11-21 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Rainer Orth <ro at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
  2023-11-21 15:05 ` [Bug c++/112652] " ro at gcc dot gnu.org
@ 2023-11-21 18:32 ` jakub at gcc dot gnu.org
  2023-11-22 15:09 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-21 18:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Strange.  On cfarm211 which is
SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
the test passes.
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:7:9: warning:
multi-character character constant [-Wmultichar]
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:8:9: warning:
multi-character character constant [-Wmultichar]
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:10:9: error:
converting to execution character set: Illegal byte sequence
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:11:9: error:
named universal character escapes are only valid in C++23
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:11:9: error:
converting UCN to execution character set: Illegal byte sequence
/export/home/jakub/gcc/gcc/testsuite/g++.dg/cpp26/literals2.C:13:9: error:
converting UCN to execution character set: Illegal byte sequence
...
You get no diagnostics for those lines at all?  Buggy libconv?
I mean the emojis certainly aren't in ISO-8859-1...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
  2023-11-21 15:05 ` [Bug c++/112652] " ro at gcc dot gnu.org
  2023-11-21 18:32 ` jakub at gcc dot gnu.org
@ 2023-11-22 15:09 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2023-11-22 15:26 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2023-11-22 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Strange.  On cfarm211 which is
> SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> the test passes.

Can you check which libiconv got picked up there?  The non-standard
OpenCSW packages on that system may include GNU libiconv and install
into default system directories, so they are picked up by default.

> You get no diagnostics for those lines at all?  Buggy libconv?

No.  There's no separate libiconv on Solaris; the iconv* functions are
included in libc.

> I mean the emojis certainly aren't in ISO-8859-1...

Probably not ;-)

FWIW, I've just built trunk with GNU libiconv 1.17 on
i386-pc-solaris2.11.  The test PASSes now with both LANG=C and
LANG=en_US.UTF-8.

I'll dig further into Solaris iconv functions here...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-11-22 15:09 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2023-11-22 15:26 ` jakub at gcc dot gnu.org
  2023-11-24  9:05 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-22 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #2)
> > --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > Strange.  On cfarm211 which is
> > SunOS gcc-solaris11 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> > the test passes.
> 
> Can you check which libiconv got picked up there?  The non-standard
> OpenCSW packages on that system may include GNU libiconv and install
> into default system directories, so they are picked up by default.

/opt/csw/lib/libiconv.so.2
> 
> > You get no diagnostics for those lines at all?  Buggy libconv?
> 
> No.  There's no separate libiconv on Solaris; the iconv* functions are
> included in libc.

On Linux I get:
echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
iconv: illegal input sequence at position 0
iconv: illegal input sequence at position 0
while on Solaris
echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
?
?
If it maps all characters which do not have representation in the destination
character set into ?, then it is useless for the test in question.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-11-22 15:26 ` jakub at gcc dot gnu.org
@ 2023-11-24  9:05 ` jakub at gcc dot gnu.org
  2024-03-12 15:51 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-24  9:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jason at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
that program is ill-formed if some character lacks encoding in the execution
character set, I'm afraid the Solaris iconv behavior results in violation of
the C++ standard requirements, it is hard to argue that in the Solaris case
e.g. ISO-8859-1 execution charset would be some special character set where ?
character represents all Unicode characters which don't have a representation
in the character set in addition to ?.
I'm afraid we don't want to maintain iconv replacement inside of libcpp though.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-11-24  9:05 ` jakub at gcc dot gnu.org
@ 2024-03-12 15:51 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-03-13 13:22 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-03-12 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
> that program is ill-formed if some character lacks encoding in the execution
> character set, I'm afraid the Solaris iconv behavior results in violation of
> the C++ standard requirements, it is hard to argue that in the Solaris case
> e.g. ISO-8859-1 execution charset would be some special character set where ?
> character represents all Unicode characters which don't have a representation
> in the character set in addition to ?.

I've now started digging into this myself.

* Solaris iconv(1) says

       output. If no conversion exists for a particular character,  an  imple-
       mentation-defined conversion is performed on this character.

* This seems to at least partially match with XPG7:

-s  Suppress any messages written to standard error concerning invalid
    characters. When -s is not used, the results of encountering invalid
    characters in the input stream (either those that are not valid
    characters in the codeset of the input file or that have no
    corresponding character in the codeset of the output file) shall be
    specified in the system documentation. The presence or absence of -s
    shall not affect the exit status of iconv.

  AFAIU that's related to what Solaris iconv(1) does, although they
  don't specify the output '?' and produce no message.  However, they
  still exit with 0, which seems wrong to me.

I've not yet tried to understand what either iconv(3) has to say on the
matter.

> I'm afraid we don't want to maintain iconv replacement inside of libcpp though.

Agreed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-03-12 15:51 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-03-13 13:22 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-03-13 13:46 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-03-13 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #6 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot
> Uni-Bielefeld.DE> ---
>> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
>> that program is ill-formed if some character lacks encoding in the execution
>> character set, I'm afraid the Solaris iconv behavior results in violation of

Although I can barely wrap my head around the standardese there, I had a
look at n4928 (the last? C++23 draft), which has a different wording
here (p.25, 5.13.3):

(3.1) — A character-literal with a c-char-sequence consisting of a
         single basic-c-char, simple-escape-sequence, or
         universal-character-name is the code unit value of the
         specified character as encoded in the literal’s associated
         character encoding.

         [Note 2 : If the specified character lacks representation in
         the literal’s associated character encoding or if it cannot be
         encoded as a single code unit, then the literal is a
         non-encodable character literal. —end note

> I've not yet tried to understand what either iconv(3) has to say on the
> matter.

Digging further, Solaris iconv(3C) has

       If  iconv()  encounters  a character in the input buffer that is legal,
       but for which an identical character does not exist in the target  code
       set,  iconv()  performs  an  implementation-defined  conversion on this
       character.

which exactly matches XPG7, so the behaviour seems to be in line with
the standards.

I've also found that Solaris 11 has iconvctl(3C) (obviously patterened
after GNU libiconv) with

       ICONV_SET_TRANSLITERATE

           With  this  request  and  a  pointer to a const int with a non-zero
           value, caller can instruct the current conversion to  transliterate
           non-identical characters from the input buffer during the code con-
           version  as  much  as it can. The value of zero, on the other hand,
           turns it off.

However,

        int transliterate = 0;
        iconvctl (cd, ICONV_SET_TRANSLITERATE, &transliterate);

doesn't make a difference.

The current Solaris iconv behaviour certainly isn't particularly
intuitive and I'll ask the Solaris engineers about it.  However, there's
the question what to do about the testcase?  Just xfail it on Solaris or
omit just the two affected subtests there?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-03-13 13:22 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-03-13 13:46 ` jakub at gcc dot gnu.org
  2024-03-13 14:59 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-13 13:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #6)
> > --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot
> > Uni-Bielefeld.DE> ---
> >> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> >> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
> >> that program is ill-formed if some character lacks encoding in the execution
> >> character set, I'm afraid the Solaris iconv behavior results in violation of
> 
> Although I can barely wrap my head around the standardese there, I had a
> look at n4928 (the last? C++23 draft), which has a different wording
> here (p.25, 5.13.3):

The testcase is for a C++26 feature, which made those ill-formed.

> The current Solaris iconv behaviour certainly isn't particularly
> intuitive and I'll ask the Solaris engineers about it.  However, there's
> the question what to do about the testcase?  Just xfail it on Solaris or
> omit just the two affected subtests there?

xfailing is one possibility, but then on Solaris we'll never support C++26
properly.
Or require using GNU libiconv rather than Solaris iconv if it can't deal with
that?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-03-13 13:46 ` jakub at gcc dot gnu.org
@ 2024-03-13 14:59 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-03-13 15:45 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-03-13 14:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #8 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #6)
>> > --- Comment #5 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot
>> > Uni-Bielefeld.DE> ---
>> >> --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>> >> Given that C++ says e.g. in https://eel.is/c++draft/lex.ccon#3.1
>> >> that program is ill-formed if some character lacks encoding in the execution
>> >> character set, I'm afraid the Solaris iconv behavior results in violation of
>> 
>> Although I can barely wrap my head around the standardese there, I had a
>> look at n4928 (the last? C++23 draft), which has a different wording
>> here (p.25, 5.13.3):
>
> The testcase is for a C++26 feature, which made those ill-formed.

Should have been obvious from the pathname ;-(  N4971 has that wording...

>> The current Solaris iconv behaviour certainly isn't particularly
>> intuitive and I'll ask the Solaris engineers about it.  However, there's
>> the question what to do about the testcase?  Just xfail it on Solaris or
>> omit just the two affected subtests there?
>
> xfailing is one possibility, but then on Solaris we'll never support C++26
> properly.

I guess it's the best solution in the short term (GCC 14), though.

> Or require using GNU libiconv rather than Solaris iconv if it can't deal with
> that?

At least document the suggestion in install.texi; I wouldn't make it a
hard requirement yet.  I'll also wait what the Solaris engineers can
provide on background for the current behaviour.

FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
from the OpenSolaris sources, modified not to do that '?' conversion.
Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
more prevalent and would need to be eradicated upstream.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-03-13 14:59 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-03-13 15:45 ` jakub at gcc dot gnu.org
  2024-03-22 13:52 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-05-07  7:42 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-13 15:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #8)
> FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
> from the OpenSolaris sources, modified not to do that '?' conversion.
> Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
> more prevalent and would need to be eradicated upstream.

If it is always '?' used instead of unknown character, we could also have some
hack on the libcpp side for it.
Like (but limited to Solaris hosts) in convert_using_iconv when converting from
SOURCE_CHARSET to some other character set don't try to convert the whole UTF-8
string at once, but split it into chunks at u'?' characters, so
foo???bar?baz?qux
would be iconv converted as
foo
???
bar
?
baz
?
qux
chunks.  And when converting the non-? chunks, it would after the conversion
check for the '?' character (in the destination character set - that is
something that perhaps could be queried during initialization after iconv_open)
and treat it as an error if it appeared there.  Or always convert also back to
UTF-8 and check if it has more '?' characters than the source.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-03-13 15:45 ` jakub at gcc dot gnu.org
@ 2024-03-22 13:52 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-05-07  7:42 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-03-22 13:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

--- Comment #10 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #8)
>> FWIW, the iconv conversion tables in /usr/lib/iconv can be regenerated
>> from the OpenSolaris sources, modified not to do that '?' conversion.
>> Worked for a quick check for the UTF-8 -> ASCII example, but the '?' is
>> more prevalent and would need to be eradicated upstream.
>
> If it is always '?' used instead of unknown character, we could also have some
> hack on the libcpp side for it.

It took me a bit to get back to you here since I had to check with both
Solaris engineering and dig up our old Solaris 9 sources (which, unlike,
OpenSolaris, have no relevant parts missing due to copyright issues).

Both what I found in the iconv conversion tables and what's documented
in unicode_iconv(7) confirms the consistent use of '?'.  The manpage has

       If the source character code value is not within a range defined by the
       source  codeset  standard, it is considered as an illegal character. If
       the source character code value is within the range(s) defined  by  the
       standard,  it  will  be considered as non-identical, even if the source
       character code value maps to an undefined or a reserved location within
       the valid range. The non-identical character will map to either ? (0x3f
       in ASCII-compatible codesets) if the target codeset  is  a  non-Unicode
       codeset  or  to  Unicode  replacement  character (U+FFFD) if the target
       codeset is an Unicode codeset.

It will of course be in the respective charset's encoding (0x3f for
ASCII, 0x6f for EBCDIC), but that's all I could find.  This is not a
complete guarantee (I may well have missed something), but seems
plausible enough...

> Like (but limited to Solaris hosts) in convert_using_iconv when converting from
> SOURCE_CHARSET to some other character set don't try to convert the whole UTF-8
> string at once, but split it into chunks at u'?' characters, so
> foo???bar?baz?qux
> would be iconv converted as
> foo
> ???
> bar
> ?
> baz
> ?
> qux
> chunks.  And when converting the non-? chunks, it would after the conversion
> check for the '?' character (in the destination character set - that is
> something that perhaps could be queried during initialization after iconv_open)
> and treat it as an error if it appeared there.  Or always convert also back to
> UTF-8 and check if it has more '?' characters than the source.

Unless we want to take the easy way out and just require GNU libiconv on
Solaris, that seems like a plausible way of handling the issue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug c++/112652] g++.dg/cpp26/literals2.C FAILs
  2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-03-22 13:52 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-05-07  7:42 ` rguenth at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07  7:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|14.0                        |14.2

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-07  7:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-21 15:04 [Bug c++/112652] New: g++.dg/cpp26/literals2.C FAILs ro at gcc dot gnu.org
2023-11-21 15:05 ` [Bug c++/112652] " ro at gcc dot gnu.org
2023-11-21 18:32 ` jakub at gcc dot gnu.org
2023-11-22 15:09 ` ro at CeBiTec dot Uni-Bielefeld.DE
2023-11-22 15:26 ` jakub at gcc dot gnu.org
2023-11-24  9:05 ` jakub at gcc dot gnu.org
2024-03-12 15:51 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-03-13 13:22 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-03-13 13:46 ` jakub at gcc dot gnu.org
2024-03-13 14:59 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-03-13 15:45 ` jakub at gcc dot gnu.org
2024-03-22 13:52 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-05-07  7:42 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).