public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* Questions about std::string::operator=(char)
       [not found] <7eeb24bf-58cc-5316-807e-64434ecf4b83.ref@yahoo.fr>
@ 2021-03-27  2:25 ` tika
  2021-03-27 10:35   ` Jonathan Wakely
  0 siblings, 1 reply; 5+ messages in thread
From: tika @ 2021-03-27  2:25 UTC (permalink / raw)
  To: libstdc++


Synopsys:?? assigning a char to a std::string makes a string 1 char long, 
but what happens if that char is zero??? And showing how easily can this 
UB pass undetected through unit testing...

The story:?? I created a stupid bug, that my unit testing did not 
catch....?? And ended up spending more more time debugging the test than 
fixing my bug.

The bug itself is kind of boring...?? Using the return value of int 
snd_get_name(int, char*) C function from the alsa library, and putting 
an int into a string. That's what I think could be a rather common mistake.

Here's my piece of code, the various tests that could be applied (and 
most fail to detect the bug in my code).?? It turns out that assigning a 
null character to a std::string breaks a lot iof invariants we usually 
depend on.?? All of gcc 10.2.1 (with libstdc++-10.2.1-9), clang 11.0.1 
and icc 2021.1.2 fail to warn of the mistake with -Wall, for gcc at 
least, that is both both on my system (fedora) and on godbolt.?? MSVC 
does a lot better, with both warnings and different outcome, with str = 
0 (correctly ?) resulting in an empty string of length zero.

I know it is UB, but I'm pretty sure the fact that this can go 
undetected so easily by the most common unit testing methods makes it an 
interseting one.?? Is it a feature? Is it something that is part of the 
standard's private UB collection? What does the standard say about 
expected invariants such as s == s.c_str() ? And about the contract of 
std::basic_string::operator(Char) ?

So, here's some code that show the issue and how stcky it is... You can 
also find it here: https://godbolt.org/z/cTWGWohv3 
<https://godbolt.org/z/cTWGWohv3>

#include<string>
#include<cstring>
#include<iostream>

// -------------------------------------------------------------------------
// simulating what is the C driver FYI, it's the alsa sound driver,
// but that's not much relevant

char*some_text =nullptr;

voidinit_text()
{
 ?????? some_text =strdup("some text");
}

intget_card_name(int,char** x)
{
*x =some_text;
return0;
}

// end 'driver' code
// -------------------------------------------------------------------------

intmain()
{
 ?????? init_text();

 ?????? inti =0;
char*name;
 ?????? std::string s;

// my original bug... &"??! happens.
 ?????? s =get_card_name(i,&name);

// Should have read:
// if (!get_card_name(i, &name))
// s = name;

// simulating EXPECT_FALSE(s.empty());
 ?????? std::cout <<"s.empty() : "<<s.empty()<<'\n';

// simulating EXPECT_NE(s.length(), 0);
 ?????? std::cout <<"s.length() : "<<s.length()<<'\n';

// simulating EXPECT_NE(s, "");
 ?????? std::cout <<"(s == \"\") : "<<(s =="")<<'\n';

// simulating EXPECT_EQ(s, s.c_str());
 ?????? std::cout <<"(s == s.c_str()): "<<(s ==s.c_str())<<" <- bug 
detected!\n";

// checking for transmisssion by copy
 ?????? std::string t =s;
 ?????? std::cout <<"t == s : "<<(t ==s)<<" <- transmission of odd 
behaviour by copy\n";

// a trace to check contents:
 ?????? std::cout <<"s : \""<<s <<"\"\n";


if(!s.empty())
 ?????? std::cout <<"int(s.front()) : "<<(int)s.front()<<'\n';

 ?????? free(some_text);
returns.length();
}



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about std::string::operator=(char)
  2021-03-27  2:25 ` Questions about std::string::operator=(char) tika
@ 2021-03-27 10:35   ` Jonathan Wakely
  2021-03-27 10:37     ` Jonathan Wakely
  2021-03-27 14:58     ` tika
  0 siblings, 2 replies; 5+ messages in thread
From: Jonathan Wakely @ 2021-03-27 10:35 UTC (permalink / raw)
  To: tika; +Cc: libstdc++

On Sat, 27 Mar 2021 at 02:26, tika via Libstdc++ <libstdc++@gcc.gnu.org> wrote:
>
>
> Synopsys:?? assigning a char to a std::string makes a string 1 char long,
> but what happens if that char is zero??? And showing how easily can this
> UB pass undetected through unit testing...

This is not UB.

>
> The story:?? I created a stupid bug, that my unit testing did not
> catch....?? And ended up spending more more time debugging the test than
> fixing my bug.
>
> The bug itself is kind of boring...?? Using the return value of int
> snd_get_name(int, char*) C function from the alsa library, and putting
> an int into a string. That's what I think could be a rather common mistake.
>
> Here's my piece of code, the various tests that could be applied (and
> most fail to detect the bug in my code).?? It turns out that assigning a
> null character to a std::string breaks a lot iof invariants we usually
> depend on.?? All of gcc 10.2.1 (with libstdc++-10.2.1-9), clang 11.0.1
> and icc 2021.1.2 fail to warn of the mistake with -Wall, for gcc at
> least, that is both both on my system (fedora) and on godbolt.?? MSVC
> does a lot better, with both warnings and different outcome, with str =
> 0 (correctly ?) resulting in an empty string of length zero.

Are you sure you tested the same thing?

> I know it is UB,

It's not.

The std::string::operator=(char) overload gets used to assign a single
char to the string, just as though you'd one str.assign("\0", 1).

but I'm pretty sure the fact that this can go
> undetected so easily by the most common unit testing methods makes it an
> interseting one.?? Is it a feature? Is it something that is part of the
> standard's private UB collection? What does the standard say about
> expected invariants such as s == s.c_str() ? And about the contract of
> std::basic_string::operator(Char) ?

There is no contract. You are allowed to put char(0) into a std::string.

This is not undefined, and is working as required by the standard.

> So, here's some code that show the issue and how stcky it is... You can
> also find it here: https://godbolt.org/z/cTWGWohv3
> <https://godbolt.org/z/cTWGWohv3>
>
> #include<string>
> #include<cstring>
> #include<iostream>
>
> // -------------------------------------------------------------------------
> // simulating what is the C driver FYI, it's the alsa sound driver,
> // but that's not much relevant
>
> char*some_text =nullptr;
>
> voidinit_text()
> {
>  ?????? some_text =strdup("some text");
> }
>
> intget_card_name(int,char** x)
> {
> *x =some_text;
> return0;
> }
>
> // end 'driver' code
> // -------------------------------------------------------------------------
>
> intmain()
> {
>  ?????? init_text();
>
>  ?????? inti =0;
> char*name;
>  ?????? std::string s;
>
> // my original bug... &"??! happens.
>  ?????? s =get_card_name(i,&name);
>
> // Should have read:
> // if (!get_card_name(i, &name))
> // s = name;
>
> // simulating EXPECT_FALSE(s.empty());
>  ?????? std::cout <<"s.empty() : "<<s.empty()<<'\n';
>
> // simulating EXPECT_NE(s.length(), 0);
>  ?????? std::cout <<"s.length() : "<<s.length()<<'\n';
>
> // simulating EXPECT_NE(s, "");
>  ?????? std::cout <<"(s == \"\") : "<<(s =="")<<'\n';
>
> // simulating EXPECT_EQ(s, s.c_str());
>  ?????? std::cout <<"(s == s.c_str()): "<<(s ==s.c_str())<<" <- bug
> detected!\n";

This is not a bug. The std::string "\0" is not equal to the const
char* "\0" because the std::string has length 1, but the length of the
const char* is 0.

> // checking for transmisssion by copy
>  ?????? std::string t =s;
>  ?????? std::cout <<"t == s : "<<(t ==s)<<" <- transmission of odd
> behaviour by copy\n";
>
> // a trace to check contents:
>  ?????? std::cout <<"s : \""<<s <<"\"\n";
>
>
> if(!s.empty())
>  ?????? std::cout <<"int(s.front()) : "<<(int)s.front()<<'\n';
>
>  ?????? free(some_text);
> returns.length();
> }
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about std::string::operator=(char)
  2021-03-27 10:35   ` Jonathan Wakely
@ 2021-03-27 10:37     ` Jonathan Wakely
  2021-03-27 14:58     ` tika
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Wakely @ 2021-03-27 10:37 UTC (permalink / raw)
  To: tika; +Cc: libstdc++

On Sat, 27 Mar 2021 at 10:35, Jonathan Wakely wrote:
>
> On Sat, 27 Mar 2021 at 02:26, tika via Libstdc++ <libstdc++@gcc.gnu.org> wrote:
> >
> >
> > Synopsys:?? assigning a char to a std::string makes a string 1 char long,
> > but what happens if that char is zero??? And showing how easily can this
> > UB pass undetected through unit testing...
>
> This is not UB.
>
> >
> > The story:?? I created a stupid bug, that my unit testing did not
> > catch....?? And ended up spending more more time debugging the test than
> > fixing my bug.
> >
> > The bug itself is kind of boring...?? Using the return value of int
> > snd_get_name(int, char*) C function from the alsa library, and putting
> > an int into a string. That's what I think could be a rather common mistake.

See the proposal
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2037r1.html

The C++ committee rejected the proposal, so it won't be part of the
C++ standard, but it might help you understand the behaviour you see
(and why GCC is correct).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about std::string::operator=(char)
  2021-03-27 10:35   ` Jonathan Wakely
  2021-03-27 10:37     ` Jonathan Wakely
@ 2021-03-27 14:58     ` tika
  2021-03-27 15:32       ` Jonathan Wakely
  1 sibling, 1 reply; 5+ messages in thread
From: tika @ 2021-03-27 14:58 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

I understand.  I do not think there is a bug in the STL, only a grey area.

My primary concern is the detection of the bug in my (the user) code in 
unit testing.  That's the bug I mean in the 'bug detected!' comment.

A warning by the compiler would be nice and sufficient to avoid most 
cases of this mistake, which unit testing can detect like this:

     std::string s;
     s = some_function_returning_int_0();
     EXPECT_EQ(s, s.c_str());

Which makes it seem UB is involved.  And the fact that the MS STL 
handles the case differently, with a warning, which is nice, and 
providing a different result (making s zero length) does force the case 
into UB territory, doesn't it?

Michaël Roy
m:o)

Le 27/03/2021 à 11:35, Jonathan Wakely a écrit :
> On Sat, 27 Mar 2021 at 02:26, tika via Libstdc++ <libstdc++@gcc.gnu.org> wrote:
>>
>> Synopsys:?? assigning a char to a std::string makes a string 1 char long,
>> but what happens if that char is zero??? And showing how easily can this
>> UB pass undetected through unit testing...
> This is not UB.
>
>> The story:?? I created a stupid bug, that my unit testing did not
>> catch....?? And ended up spending more more time debugging the test than
>> fixing my bug.
>>
>> The bug itself is kind of boring...?? Using the return value of int
>> snd_get_name(int, char*) C function from the alsa library, and putting
>> an int into a string. That's what I think could be a rather common mistake.
>>
>> Here's my piece of code, the various tests that could be applied (and
>> most fail to detect the bug in my code).?? It turns out that assigning a
>> null character to a std::string breaks a lot iof invariants we usually
>> depend on.?? All of gcc 10.2.1 (with libstdc++-10.2.1-9), clang 11.0.1
>> and icc 2021.1.2 fail to warn of the mistake with -Wall, for gcc at
>> least, that is both both on my system (fedora) and on godbolt.?? MSVC
>> does a lot better, with both warnings and different outcome, with str =
>> 0 (correctly ?) resulting in an empty string of length zero.
> Are you sure you tested the same thing?
>
>> I know it is UB,
> It's not.
>
> The std::string::operator=(char) overload gets used to assign a single
> char to the string, just as though you'd one str.assign("\0", 1).
>
> but I'm pretty sure the fact that this can go
>> undetected so easily by the most common unit testing methods makes it an
>> interseting one.?? Is it a feature? Is it something that is part of the
>> standard's private UB collection? What does the standard say about
>> expected invariants such as s == s.c_str() ? And about the contract of
>> std::basic_string::operator(Char) ?
> There is no contract. You are allowed to put char(0) into a std::string.
>
> This is not undefined, and is working as required by the standard.
>
>> So, here's some code that show the issue and how stcky it is... You can
>> also find it here: https://godbolt.org/z/cTWGWohv3
>> <https://godbolt.org/z/cTWGWohv3>
>>
>> #include<string>
>> #include<cstring>
>> #include<iostream>
>>
>> // -------------------------------------------------------------------------
>> // simulating what is the C driver FYI, it's the alsa sound driver,
>> // but that's not much relevant
>>
>> char*some_text =nullptr;
>>
>> voidinit_text()
>> {
>>   ?????? some_text =strdup("some text");
>> }
>>
>> intget_card_name(int,char** x)
>> {
>> *x =some_text;
>> return0;
>> }
>>
>> // end 'driver' code
>> // -------------------------------------------------------------------------
>>
>> intmain()
>> {
>>   ?????? init_text();
>>
>>   ?????? inti =0;
>> char*name;
>>   ?????? std::string s;
>>
>> // my original bug... &"??! happens.
>>   ?????? s =get_card_name(i,&name);
>>
>> // Should have read:
>> // if (!get_card_name(i, &name))
>> // s = name;
>>
>> // simulating EXPECT_FALSE(s.empty());
>>   ?????? std::cout <<"s.empty() : "<<s.empty()<<'\n';
>>
>> // simulating EXPECT_NE(s.length(), 0);
>>   ?????? std::cout <<"s.length() : "<<s.length()<<'\n';
>>
>> // simulating EXPECT_NE(s, "");
>>   ?????? std::cout <<"(s == \"\") : "<<(s =="")<<'\n';
>>
>> // simulating EXPECT_EQ(s, s.c_str());
>>   ?????? std::cout <<"(s == s.c_str()): "<<(s ==s.c_str())<<" <- bug
>> detected!\n";
> This is not a bug. The std::string "\0" is not equal to the const
> char* "\0" because the std::string has length 1, but the length of the
> const char* is 0.
>
>> // checking for transmisssion by copy
>>   ?????? std::string t =s;
>>   ?????? std::cout <<"t == s : "<<(t ==s)<<" <- transmission of odd
>> behaviour by copy\n";
>>
>> // a trace to check contents:
>>   ?????? std::cout <<"s : \""<<s <<"\"\n";
>>
>>
>> if(!s.empty())
>>   ?????? std::cout <<"int(s.front()) : "<<(int)s.front()<<'\n';
>>
>>   ?????? free(some_text);
>> returns.length();
>> }
>>
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Questions about std::string::operator=(char)
  2021-03-27 14:58     ` tika
@ 2021-03-27 15:32       ` Jonathan Wakely
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Wakely @ 2021-03-27 15:32 UTC (permalink / raw)
  To: tika; +Cc: libstdc++

On Sat, 27 Mar 2021 at 14:58, tika wrote:
>
> I understand.  I do not think there is a bug in the STL, only a grey area.
>
> My primary concern is the detection of the bug in my (the user) code in
> unit testing.  That's the bug I mean in the 'bug detected!' comment.
>
> A warning by the compiler would be nice and sufficient to avoid most
> cases of this mistake, which unit testing can detect like this:
>
>      std::string s;
>      s = some_function_returning_int_0();
>      EXPECT_EQ(s, s.c_str());
>
> Which makes it seem UB is involved.  And the fact that the MS STL
> handles the case differently, with a warning, which is nice, and
> providing a different result (making s zero length) does force the case
> into UB territory, doesn't it?

No, that's not how UB works. If MS does that then they have a bug. But
I've looked at their code, and I don't see how they can do what you
see, because the code looks like it creates a string of length 1:

https://github.com/microsoft/STL/blob/6aed1a69d64367bad60a4c9714d7658fe6c70f4d/stl/inc/xstring#L3095-L3101

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-27 15:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <7eeb24bf-58cc-5316-807e-64434ecf4b83.ref@yahoo.fr>
2021-03-27  2:25 ` Questions about std::string::operator=(char) tika
2021-03-27 10:35   ` Jonathan Wakely
2021-03-27 10:37     ` Jonathan Wakely
2021-03-27 14:58     ` tika
2021-03-27 15:32       ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).