public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn
@ 2022-09-13 12:27 jzwinck at gmail dot com
  2022-09-13 14:48 ` [Bug c++/106926] " redi at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: jzwinck at gmail dot com @ 2022-09-13 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106926

            Bug ID: 106926
           Summary: string_view construction from literal string
                    containing null/zero should warn
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jzwinck at gmail dot com
  Target Milestone: ---

This code compiles but does something the programmer almost certainly does not
want:

    #include <string_view>
    constexpr std::string_view sv = "four\0nine"; // 9 bytes of data
    static_assert(sv.size() == 4); // required by C++, but surprising

GCC can't implement what the programmer intended, so it should warn if a
string_view is constructed or assigned from a string literal which contains
null bytes before the end.

When this happens, the rest of the string literal data is still accessible but
only if you know it's there by some other means.  I expect such usage to be
very rare, so warning about it even at -Wall seems reasonable.  Even if the
warning only comes with -Wextra that would be better than silence.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/106926] string_view construction from literal string containing null/zero should warn
  2022-09-13 12:27 [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn jzwinck at gmail dot com
@ 2022-09-13 14:48 ` redi at gcc dot gnu.org
  2022-09-15 18:21 ` jzwinck at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2022-09-13 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106926

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |diagnostic
             Blocks|                            |87403
   Last reconfirmed|                            |2022-09-13
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Is this a real problem you've seen in the wild?

The same argument applies to std::string, doesn't it?

To generalise it, I think we would probably want a new attribute telling the
compiler that a character pointer argument will be passed to strlen, and only
that many bytes used (i.e. it's treated as a "null-terminated byte string" in
ISO C++ terminology). Then the compiler can warn if an argument to that
function has unreachable characters.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87403
[Bug 87403] [Meta-bug] Issues that suggest a new warning

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/106926] string_view construction from literal string containing null/zero should warn
  2022-09-13 12:27 [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn jzwinck at gmail dot com
  2022-09-13 14:48 ` [Bug c++/106926] " redi at gcc dot gnu.org
@ 2022-09-15 18:21 ` jzwinck at gmail dot com
  2022-09-15 20:21 ` redi at gcc dot gnu.org
  2022-09-16 14:24 ` jzwinck at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jzwinck at gmail dot com @ 2022-09-15 18:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106926

--- Comment #2 from John Zwinck <jzwinck at gmail dot com> ---
Jonathan, yes it was a real problem, I wrote such buggy code myself.  I was
more complacent with string_view than I might have been with std::string
because everyone knows string_view doesn't have to be null terminated (bad
excuse, but it's mine).

I agree the same problem could happen with std::string.  As for your idea to
add an attribute, I assume you mean something like this:

    string_view(const char* s __attribute__((does_strlen)));

I think this would work but it seems like the attribute would have to be added
in many places.  Instead, the compiler could statically determine that the
length of the string is lost in code like this:

    string_view foo("bad\0string");

And maybe even here:

    const char* bar = "another\0one";

Though that may be a step too far because someone could hard-code the length 12
elsewhere, and there probably is code in the wild doing that.

In general the diagnostic could apply wherever the compiler knows the contents
will be copied.  Since that's not always possible to know, maybe it could
assume copying will happen when the literal is passed to an out-of-line
function.

I recognize this might be too harsh for -Wall; I'd still be quite happy to see
the warning with -Wextra.  The attribute idea is also fine, if you think it's
feasible to apply it in enough places and not too ugly.

Thank you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/106926] string_view construction from literal string containing null/zero should warn
  2022-09-13 12:27 [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn jzwinck at gmail dot com
  2022-09-13 14:48 ` [Bug c++/106926] " redi at gcc dot gnu.org
  2022-09-15 18:21 ` jzwinck at gmail dot com
@ 2022-09-15 20:21 ` redi at gcc dot gnu.org
  2022-09-16 14:24 ` jzwinck at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2022-09-15 20:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106926

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to John Zwinck from comment #2)
> I agree the same problem could happen with std::string.  As for your idea to
> add an attribute, I assume you mean something like this:
> 
>     string_view(const char* s __attribute__((does_strlen)));

Yes.

> I think this would work but it seems like the attribute would have to be
> added in many places.

Why? I don't think there are that many places.

>  Instead, the compiler could statically determine that
> the length of the string is lost in code like this:
> 
>     string_view foo("bad\0string");

How would it determine that? By inlining the eventual call to strlen (several
function calls down from the location of the literal)?

> And maybe even here:
> 
>     const char* bar = "another\0one";
> 
> Though that may be a step too far because someone could hard-code the length
> 12 elsewhere, and there probably is code in the wild doing that.

I agree that's not a good idea.

> In general the diagnostic could apply wherever the compiler knows the
> contents will be copied.

But that doesn't apply to string_view, there's no copying. And such a warning
would require optimization and inlining to be effective. How would the compiler
know that? That's why I suggested an attribute, because the call site can give
a warning without having to know what happens inside the function, and
everything that it calls.

Relying on arbitrarily deep inlining doesn't sound like it will be very
effective, nor does expecting the compiler to just "know" which functions this
applies to.

>  Since that's not always possible to know, maybe it
> could assume copying will happen when the literal is passed to an
> out-of-line function.

That seems like a bad idea too. Every function is "an out of line function" at
-O0.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/106926] string_view construction from literal string containing null/zero should warn
  2022-09-13 12:27 [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn jzwinck at gmail dot com
                   ` (2 preceding siblings ...)
  2022-09-15 20:21 ` redi at gcc dot gnu.org
@ 2022-09-16 14:24 ` jzwinck at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jzwinck at gmail dot com @ 2022-09-16 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106926

--- Comment #4 from John Zwinck <jzwinck at gmail dot com> ---
Jonathan, thank you for the discussion, I agree with your points and think
adding an attribute for this makes sense.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-16 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13 12:27 [Bug c++/106926] New: string_view construction from literal string containing null/zero should warn jzwinck at gmail dot com
2022-09-13 14:48 ` [Bug c++/106926] " redi at gcc dot gnu.org
2022-09-15 18:21 ` jzwinck at gmail dot com
2022-09-15 20:21 ` redi at gcc dot gnu.org
2022-09-16 14:24 ` jzwinck at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).