public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* string::iterator should have more error checking
@ 2022-06-23 21:05 Frederick Virchanza Gotham
  2022-06-23 21:26 ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2022-06-23 21:05 UTC (permalink / raw)
  To: libstdc++

If a program is compiled with "-D_GLIBCXX_DEBUG", I would expect it at
runtime to catch the error on the last line in the following program:

#include <iostream>
#include <string>
#include <string_view>
#include <type_traits>

using namespace std;

int main(void)
{
    cout << "string::const_iterator is "
         << (is_same_v< string::const_iterator, char const * > ? "just
a raw pointer" : "NOT a simple pointer") << endl;

    cout << "string_view::const_iterator is "
         << (is_same_v< string_view::const_iterator, char const * > ?
"just a raw pointer" : "NOT a simple pointer") << endl;

    string s("brush");

    cout << string_view( &*(s.cbegin() + 1u), &*(s.cend() + 876u) ) << endl;
}


string::iterator is NOT a simple pointer -- it is a class and so we
can overload the following operators to catch errors:

(1) unary operator*
(2) binary operator+
(3) binary operator-

The error on the last line of the above program would be caught at
runtime if the iterator were written as follows:

class string {

    class iterator {

        char const *const p_min, *const p_max;  // initialised in constructor

        char *p;

    public:

        iterator &operator+(ptrdiff_t const n)
        {
            assert( p+n >= p_min );
            assert( p+n <= p_max );

            // more code here
        }
    };
};


Similarly the unary operator* could be overloaded to catch the error
when "end()" gets dereferenced.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-23 21:05 string::iterator should have more error checking Frederick Virchanza Gotham
@ 2022-06-23 21:26 ` Jonathan Wakely
  2022-06-23 22:01   ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-06-23 21:26 UTC (permalink / raw)
  To: cauldwell.thomas; +Cc: libstdc++

On Thu, 23 Jun 2022 at 22:05, Frederick Virchanza Gotham via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> If a program is compiled with "-D_GLIBCXX_DEBUG", I would expect it at
> runtime to catch the error on the last line in the following program:
>
> #include <iostream>
> #include <string>
> #include <string_view>
> #include <type_traits>
>
> using namespace std;
>
> int main(void)
> {
>     cout << "string::const_iterator is "
>          << (is_same_v< string::const_iterator, char const * > ? "just
> a raw pointer" : "NOT a simple pointer") << endl;
>
>     cout << "string_view::const_iterator is "
>          << (is_same_v< string_view::const_iterator, char const * > ?
> "just a raw pointer" : "NOT a simple pointer") << endl;
>
>     string s("brush");
>
>     cout << string_view( &*(s.cbegin() + 1u), &*(s.cend() + 876u) ) << endl;
> }
>
>
> string::iterator is NOT a simple pointer -- it is a class and so we
> can overload the following operators to catch errors:
>
> (1) unary operator*
> (2) binary operator+
> (3) binary operator-
>
> The error on the last line of the above program would be caught at
> runtime if the iterator were written as follows:
>
> class string {
>
>     class iterator {
>
>         char const *const p_min, *const p_max;  // initialised in constructor
>
>         char *p;
>
>     public:
>
>         iterator &operator+(ptrdiff_t const n)
>         {
>             assert( p+n >= p_min );
>             assert( p+n <= p_max );
>
>             // more code here
>         }
>     };
> };

I don't think it would be a good idea to do it like this when we
already have an entire Debug Mode framework for handling iterator
validity. You can see how that handles your exmples by repalcing
std::string with __gnu_debug::string (defined in <debug/string>).


> Similarly the unary operator* could be overloaded to catch the error
> when "end()" gets dereferenced.

The lack of iterator checking is documented at
https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode_semantics.html
and unlikely to change now. IIRC the main reasons for not adding
iterator checking to std::string were that it would hurt performance,
and that most std::string operations aren't done with iterators
anyway. As soon as you use c_str() or data() to get a raw pointer,
there's nothing that checked iterators can do.

AddressSanitizer does give an error for your example though.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-23 21:26 ` Jonathan Wakely
@ 2022-06-23 22:01   ` Frederick Virchanza Gotham
  2022-06-23 22:35     ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2022-06-23 22:01 UTC (permalink / raw)
  To: libstdc++

On Thu, Jun 23, 2022 at 10:26 PM Jonathan Wakely <jwakely@redhat.com> wrote:

> AddressSanitizer does give an error for your example though.


If you're talking about using "g++" with the command line option
"-fsanitize=address", well I've tested it and it **doesn't** give an
error for my sample program.

I thought that "-D_GLIBCXX_DEBUG" would make it use
"__gnu_debug::string". So are you saying that I'll have to do the
following in my code?

#ifdef NDEBUG
    #include <string>
    typedef std::string string;
#else
    #include <debug/string>
    typedef __gnu_debug::string string;
#endif

Is this what I'm supposed to do?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-23 22:01   ` Frederick Virchanza Gotham
@ 2022-06-23 22:35     ` Jonathan Wakely
  2022-06-24  9:06       ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-06-23 22:35 UTC (permalink / raw)
  To: cauldwell.thomas; +Cc: libstdc++

On Thu, 23 Jun 2022 at 23:01, Frederick Virchanza Gotham via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> On Thu, Jun 23, 2022 at 10:26 PM Jonathan Wakely <jwakely@redhat.com> wrote:
>
> > AddressSanitizer does give an error for your example though.
>
>
> If you're talking about using "g++" with the command line option
> "-fsanitize=address", well I've tested it and it **doesn't** give an
> error for my sample program.

Yes it does:
https://godbolt.org/z/hYhcfjTWY



>
> I thought that "-D_GLIBCXX_DEBUG" would make it use
> "__gnu_debug::string". So are you saying that I'll have to do the
> following in my code?
>
> #ifdef NDEBUG
>     #include <string>
>     typedef std::string string;
> #else
>     #include <debug/string>
>     typedef __gnu_debug::string string;
> #endif
>
> Is this what I'm supposed to do?

Yes, for full Debug Mode checks you need to use __gnu_debug::string,
as documented in the manual.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-23 22:35     ` Jonathan Wakely
@ 2022-06-24  9:06       ` Frederick Virchanza Gotham
  2022-06-24  9:35         ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2022-06-24  9:06 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

On Thu, Jun 23, 2022 at 11:36 PM Jonathan Wakely <jwakely@redhat.com> wrote:

> Yes it does:
> https://godbolt.org/z/hYhcfjTWY


You're right, that does work.

I've put together another little sample program. The dereferencing of
an "end()" iterator is flagged by __gnu_debug::_Safe_iterator, however
the invalid memory access on the last line -- strangely -- is not
noticed by "-fsanitize"

        https://godbolt.org/z/4nzjGG8o3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-24  9:06       ` Frederick Virchanza Gotham
@ 2022-06-24  9:35         ` Jonathan Wakely
  2022-06-24 10:10           ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-06-24  9:35 UTC (permalink / raw)
  To: cauldwell.thomas; +Cc: libstdc++

On Fri, 24 Jun 2022 at 10:06, Frederick Virchanza Gotham
<cauldwell.thomas@gmail.com> wrote:
>
> On Thu, Jun 23, 2022 at 11:36 PM Jonathan Wakely <jwakely@redhat.com> wrote:
>
> > Yes it does:
> > https://godbolt.org/z/hYhcfjTWY
>
>
> You're right, that does work.
>
> I've put together another little sample program. The dereferencing of
> an "end()" iterator is flagged by __gnu_debug::_Safe_iterator, however
> the invalid memory access on the last line -- strangely -- is not
> noticed by "-fsanitize"
>
>         https://godbolt.org/z/4nzjGG8o3

That's expected. The contents of the string_view are a string literal,
which is in the program image, not on the heap. AddressSanitizer
doesn't check access to such memory.
The byte after the string literal "brush" is uninitialized, but it
does exist in a valid memory page.

And it's not detected by Debug Mode because string_view iterators are
just pointers.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-24  9:35         ` Jonathan Wakely
@ 2022-06-24 10:10           ` Frederick Virchanza Gotham
  2022-06-24 10:28             ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2022-06-24 10:10 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

On Fri, Jun 24, 2022 at 10:35 AM Jonathan Wakely <jwakely@redhat.com> wrote:

> That's expected. The contents of the string_view are a string literal,
> which is in the program image, not on the heap.


Valgrind is the best tool for troubleshooting heap problems (although
it's useless against non-heap problems).


> AddressSanitizer
> doesn't check access to such memory.
> The byte after the string literal "brush" is uninitialized, but it
> does exist in a valid memory page.


Check out how "-fsanitize" flags the following invalid access to
static-duration constexpr arrays:

        https://godbolt.org/z/fjnPTWj5P

Having thought about this a little more, I realise that "brush" in memory is:

      char constexpr brush[6u] = { 'b', 'r', 'u', 's', 'h', '\0' };

And so therefore it is perfectly valid to access the null terminator
located at brush[5u], which is why the access is not flagged by
"-fsanitize".


> And it's not detected by Debug Mode because string_view iterators are
> just pointers.


I understand that. I think there should be "<debug/string_view>" and
that it should be optionally configurable to barf when it encounters a
null terminator. (It must be optionally configurable because a null
char is actually a valid char inside a string_view). Something like:

    #define _GLIBCXX_STRINGVIEW_DONT_ALLOW_NULL_CHAR
    #include <debug/string_view>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-24 10:10           ` Frederick Virchanza Gotham
@ 2022-06-24 10:28             ` Jonathan Wakely
  2022-06-24 11:14               ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2022-06-24 10:28 UTC (permalink / raw)
  To: cauldwell.thomas; +Cc: libstdc++

On Fri, 24 Jun 2022 at 11:11, Frederick Virchanza Gotham
<cauldwell.thomas@gmail.com> wrote:
>
> On Fri, Jun 24, 2022 at 10:35 AM Jonathan Wakely <jwakely@redhat.com> wrote:
>
> > That's expected. The contents of the string_view are a string literal,
> > which is in the program image, not on the heap.
>
>
> Valgrind is the best tool for troubleshooting heap problems (although
> it's useless against non-heap problems).
>
>
> > AddressSanitizer
> > doesn't check access to such memory.
> > The byte after the string literal "brush" is uninitialized, but it
> > does exist in a valid memory page.
>
>
> Check out how "-fsanitize" flags the following invalid access to
> static-duration constexpr arrays:
>
>         https://godbolt.org/z/fjnPTWj5P
>
> Having thought about this a little more, I realise that "brush" in memory is:
>
>       char constexpr brush[6u] = { 'b', 'r', 'u', 's', 'h', '\0' };
>
> And so therefore it is perfectly valid to access the null terminator
> located at brush[5u], which is why the access is not flagged by
> "-fsanitize".

But &*(s2.cend() + 1u) accesses the byte after the null, doesn't it?

The issue is more that the compiler turns &*p into just p, and then
there is no further *p dereference that can be seen by ASan. It's
still undefined though.

>
> > And it's not detected by Debug Mode because string_view iterators are
> > just pointers.
>
>
> I understand that. I think there should be "<debug/string_view>" and
> that it should be optionally configurable to barf when it encounters a
> null terminator. (It must be optionally configurable because a null
> char is actually a valid char inside a string_view). Something like:
>
>     #define _GLIBCXX_STRINGVIEW_DONT_ALLOW_NULL_CHAR
>     #include <debug/string_view>

A debug string_view is unlikely to happen unless you do the work to add it.

I don't think that extra check is appropriate though. As you say, it's
entirely valid according to the standard, so if your application wants
to disallow that, checking for it belongs in your application.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: string::iterator should have more error checking
  2022-06-24 10:28             ` Jonathan Wakely
@ 2022-06-24 11:14               ` Frederick Virchanza Gotham
  0 siblings, 0 replies; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2022-06-24 11:14 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++

On Fri, Jun 24, 2022 at 11:28 AM Jonathan Wakely <jwakely@redhat.com> wrote:

> But &*(s2.cend() + 1u) accesses the byte after the null, doesn't it?


It dereferences a pointer to the byte after the null, certainly, but
there isn't actually any reading/writing of the byte after the null.
I'm pretty sure that's why "-fsanitize" doesn't notice a problem.

However . . . check out the following:

        https://godbolt.org/z/jPcv9M6e3

Strangely it doesn't mind printing an invalid "uintptr_t", but it does
mind printing an invalid pointer.


> I don't think that extra check is appropriate though. As you say, it's
> entirely valid according to the standard, so if your application wants
> to disallow that, checking for it belongs in your application.


Yeah maybe I should inherit from string_view.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-24 11:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23 21:05 string::iterator should have more error checking Frederick Virchanza Gotham
2022-06-23 21:26 ` Jonathan Wakely
2022-06-23 22:01   ` Frederick Virchanza Gotham
2022-06-23 22:35     ` Jonathan Wakely
2022-06-24  9:06       ` Frederick Virchanza Gotham
2022-06-24  9:35         ` Jonathan Wakely
2022-06-24 10:10           ` Frederick Virchanza Gotham
2022-06-24 10:28             ` Jonathan Wakely
2022-06-24 11:14               ` Frederick Virchanza Gotham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).