public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: Jonathan Wakely <jwakely@redhat.com>,
	libstdc++ <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [committed] libstdc++: Disable over-zealous warnings about std::string copies [PR103332]
Date: Fri, 10 Dec 2021 09:35:50 -0700	[thread overview]
Message-ID: <8a43115a-fab5-e2e7-6737-e0fbc43ebb59@gmail.com> (raw)
In-Reply-To: <CAH6eHdR3jD1GNg5cku3vgBDqjmmnb6fGrg1c3bvtb5RC+qYp_Q@mail.gmail.com>

On 12/10/21 3:12 AM, Jonathan Wakely wrote:
> On Fri, 10 Dec 2021 at 01:50, Martin Sebor via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
>>
>> On 12/9/21 5:38 PM, Martin Sebor wrote:
>>> On 12/9/21 4:24 PM, Jonathan Wakely via Gcc-patches wrote:
>>>> These warnings are triggered by perfectly valid code using std::string.
>>>> They're particularly bad when --enable-fully-dynamic-string is used,
>>>> because even std::string().begin() will give a warning.
>>>>
>>>> Use pragmas to stop the troublesome warnings for copies done by
>>>> std::char_traits.
>>>
>>> I'm still experimenting with some of the approaches we discussed
>>> last week, but based on my findings so far this was going to be
>>> my suggestion at lest for now, until or unless the problem turns
>>> out to affect more code than just std::string.
>>
>> Just minutes after I wrote this I tried following the clue
>> in the note printed for the test case from PR 103534 with
>> an enhancement I'm experimenting with:
>>
>> /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:426:56:
>> warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’
>> specified size between 18446744073709551600 and 18446744073709551615
>> exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
>>     426 |         return static_cast<char_type*>(__builtin_memcpy(__s1,
>> __s2, __n));
>>         |
>> ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
>> /build/gcc-master/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:426:56:
>> note: when
>> ‘(<anonymous>.std::__cxx11::basic_string<char>::_M_string_length >
>> 18446744073709551599)’
>>
>> and adding an assert to string::size():
>>
>>         constexpr
>>         size_type
>>         size() const noexcept
>>         {
>>           if (_M_string_length >= -1LU >> 1)
> 
> I think that will be wrong for 64-bit Windows, where long is 32-bit,
> but the string's max size is closer to -1LLU >> 2, and we don't want
> to just use -1LLU because that's wrong for 32-bit targets.
> 
> Technically the max size is (N - 1)/2 where N is the allocator's max
> size, which is PTRDIFF_MAX for std::allocator, SIZE_MAX for allocators
> that use the default value from std::allocator_traits, or some other
> value that the allocator defines. And the value is reduced
> appropriately if sizeof(char_type) > 1.
> 
> But if we call this->max_size() which calls _Alloc_traits::max_size()
> which potentially calls allocator_type::max_size(), will the compiler
> actually make those calls to mark the path unreachable, or will it
> just ignore the unreachable if it can't fold all those calls at
> compile time?

The above was just a quick proof of concept experiment.  You're
of course right that the final solution can't be so crude(*).
But if the required functions are always_inline (I think member
functions defined in the body of the class implicitly are) then
I'd expect them to be folded at compile time at all optimization
levels.  That's what makes -Winvalid-memory-order work in C++
even at -O0 in the patch I posted earlier this week.

If I still remember my C++-library-fu, even though the standard
requires containers to call max_size() etc., since std::string
is (or can be) an explicit specialization, there shouldn't be
a way for a conforming program to find out if it does, so it
could take shortcuts.  That makes me wonder if it could even
call malloc instead of operator new to get around the problem
with the operator's replaceability.

> 
> We could just use __SIZE_MAX__/2 which might be larger than the real
> maximum, but will never be smaller.
> 
> Jason made a suggestion last week which was that if the warning code
> has a range like [2,INT_MAX] or [2UL,ULONG_MAX] that it should assume
> that is really [2,infinity] and so not warn. If the upper bound is the
> maximum possible value for the type, that's effectively unbounded, and
> the warning could assume it's simply not got enough information to
> give a useful warning. If it has a range like [2,300] for a buffer of
> length 100, that should warn. But [2,infinity] is effectively ranger
> saying "I have no idea what this value is" and the warning machinery
> should not be issuing warnings on the basis of "I have no idea what
> this value is".

The warnings use the lower bound of the access size and ignore
the upper bound.  They use the upper bound of the space remaining
in the destination(s) and ignore the lower bound.

In a call to memcpy (P + I, s, N) the access size is N, and
the space remaining in A is the upper bound of the size of
the largest array A that P points to (it might point to any
number of them, and some might be dynamically allocated with
a size in some range) minus the lower bound of I or zero,
whichever is greater.

This is the most conservative strategy.  There are other
strategies where we could get more true positives but also
more false positives.  Those might be appropriate for
the "maybe" kind of warnings.

Martin

>>             __builtin_unreachable ();
>>           return _M_string_length;
>>         }
>>
>> That gets rid of the false positive in this PR.  I realize
>> the others happen for other reasons but this approach at least
>> suggests that there might be other ways to suppress them than
>> the #pragma.  Unlike it, the alternative approaches should also
>> improve codegen.


  reply	other threads:[~2021-12-10 16:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 23:24 Jonathan Wakely
2021-12-10  0:38 ` Martin Sebor
2021-12-10  1:49   ` Martin Sebor
2021-12-10 10:12     ` Jonathan Wakely
2021-12-10 16:35       ` Martin Sebor [this message]
2021-12-10 16:41         ` Jakub Jelinek
2021-12-10 17:11           ` Martin Sebor
2021-12-10 17:17             ` Jakub Jelinek
2021-12-10 18:16               ` Jonathan Wakely
2021-12-10 18:10         ` Jonathan Wakely
2021-12-10  9:51   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a43115a-fab5-e2e7-6737-e0fbc43ebb59@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely.gcc@gmail.com \
    --cc=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).