[Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer
@ 2022-07-21 18:00 redi at gcc dot gnu.org
  2023-06-08 17:08 ` [Bug analyzer/106392] " vultkayn at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-21 18:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106392

            Bug ID: 106392
           Summary: Support iteration over C++ containers in -fanalyzer
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: analyzer
          Assignee: dmalcolm at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
            Blocks: 97110
  Target Milestone: ---

I haven't really thought this one through completely, and maybe it's too hard,
but it would be good to check that 'for' loops using iterators don't do silly
things, e.g.

for (auto iter = cont.begin(); iter != cont.end() - 1; ++iter)
  // ...

This is unsafe if the container is not empty, because end-1 is invalid.
Similarly for begin+1 when the container is empty. Need to check cont.begin()
!= cont.end() before doing begin+1 or end-1

And end+1 is always wrong.

Range-based for loops should not modify the container being iterated over:

for (auto& x : cont)
  if (x == 3)
    cont.push_back(4);

Modifying the container while traversing it might invalidate the iterators and
have undefined behaviour. (No iterators are visible in the code, but the
compiler simply expands the range-based 'for' into a traditional 'for' using
two iterators, and just because the user can't see them, those iterators are
still invalidated by modifications to the container.

Modifying the container in any loop is potentially risky, if not done
correctly. For an associative container such as std::set, std::map etc. this is
wrong:

  for (auto it = set.begin(); it != set.end(); ++it)
    if (*it == 3)
      set.erase(it);

The ++it increment is undefined after erasing that element, because the erase
call invalidates the iterator.

This solves the invalidation problem, but is still wrong:

  for (auto it = set.begin(); it != set.end(); ++it)
    if (*it == 3)
      it = set.erase(it); // returns iterator to the element after the erased
one

Now we get a valid iterator back from erase, but then we skip the next element
because the ++it always happens at the end of each loop iteration. And if the
erased element was the last one then erase returns the end() iterator, then we
do ++it on the past-the-end iterator, which is undefined.

This is the right way to do it:

  for (auto it = set.begin(); it != set.end(); )
    if (*it == 3)
      it = set.erase(it);
    else
      ++it;

But ideally, just don't. Use std::erase_if(cont, 3) instead.

It would be great if the analyzer understood enough to flag some of these.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97110
[Bug 97110] [meta-bug] tracker bug for supporting C++ in -fanalyzer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug analyzer/106392] Support iteration over C++ containers in -fanalyzer
  2022-07-21 18:00 [Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer redi at gcc dot gnu.org
@ 2023-06-08 17:08 ` vultkayn at gcc dot gnu.org
  2023-06-08 19:46 ` redi at gcc dot gnu.org
  2023-06-08 19:57 ` vultkayn at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vultkayn at gcc dot gnu.org @ 2023-06-08 17:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106392

Benjamin Priour <vultkayn at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vultkayn at gcc dot gnu.org

--- Comment #1 from Benjamin Priour <vultkayn at gcc dot gnu.org> ---
Access to end iterator should be flagged as undefined behavior.
Detecting invalidation is a difficult problem, as many methods are only
sometimes invalidating.
Checking a reallocation of the underlying container, e.g. for vectors, is
(always?) a good check, but it is not sufficient, e.g. consider
unordered_map.rehash(0)

Additionally, flagging all calls to "possibly invalidating operations" will
obviously lead to too
many false positive.

Thus, it might be relevant to have multiple attributes that keep track of
different kinds of invalidation.
Invalidating operations not only differ on their certainty, but on the
iterator(s) they invalidate.
Some will invalidate every iterators, while some

[[gnu::invalidate]] - always invalidate
[[gnu::invalidate(resize[, size_varname, capacity_varname, factor_varname ])]]
- All iterators are invalidated ? Examples of such case is a vector's size
exceeding its capacity, or a hashmap too loaded.
    Would be detected by keeping track of push*, emplace*, insert* methods, as
well as clear, extract, erase.
    If size_, capacity_ and factor_varname are provided, the invalidation is
done only if size/capacity >= factor. 
[[gnu::invalidate(swap)]] - Both containers should be invalidated. Name
probably ill-chosen since swapping two lists invalidates nothing. 
[[gnu::invalidate(rehash)]] - Generic remapping of every key -> element. This
one I think could be entirely replaced by the following.
[[gnu::invalidate(adaptor)]] - Look at the underlying container invalidation
rule.
[[gnu::invalidate(_from_[, _to_[, _step_]])]] - Invalidate all iterators from
_from_ to _to_ (default end()), with a jump of _step_ (default 1).

Or combine them.

For adaptors, we must look at their underlying containers.

What policy to take for unknown parameters ?
Typically if the load factor is determined at runtime, we simply cannot
precisely determine the event.
I believe that in such case, either the previous known value is used by the
analyzer as a heuristic, or if no other known user-provided value was given,
we use the implementation's default value.

When invalidating a range, how to determine the "following" iterators, that
should be invalidated, when we are not dealing with random iterators ?
Is there even such a case in the standard library, where a method over a
container without random iterators invalidates a range/ a subset of all
iterators ?
I didn't find any, and it is also counter-intuitive, and std::*_list certainly
are not.

More research needed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug analyzer/106392] Support iteration over C++ containers in -fanalyzer
  2022-07-21 18:00 [Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer redi at gcc dot gnu.org
  2023-06-08 17:08 ` [Bug analyzer/106392] " vultkayn at gcc dot gnu.org
@ 2023-06-08 19:46 ` redi at gcc dot gnu.org
  2023-06-08 19:57 ` vultkayn at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: redi at gcc dot gnu.org @ 2023-06-08 19:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106392

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Benjamin Priour from comment #1)
> [[gnu::invalidate(swap)]] - Both containers should be invalidated. Name
> probably ill-chosen since swapping two lists invalidates nothing. 

Swapping doesn't invalidate iterators for any standard container.

I think the unordered containers might be too hard. I would start with
std::vector, as that will probably give the best return on investment of
effort.

> When invalidating a range, how to determine the "following" iterators, that
> should be invalidated, when we are not dealing with random iterators ?
> Is there even such a case in the standard library, where a method over a
> container without random iterators invalidates a range/ a subset of all
> iterators ?

erase(first, last) invalidates iterators in the erased range.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug analyzer/106392] Support iteration over C++ containers in -fanalyzer
  2022-07-21 18:00 [Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer redi at gcc dot gnu.org
  2023-06-08 17:08 ` [Bug analyzer/106392] " vultkayn at gcc dot gnu.org
  2023-06-08 19:46 ` redi at gcc dot gnu.org
@ 2023-06-08 19:57 ` vultkayn at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vultkayn at gcc dot gnu.org @ 2023-06-08 19:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106392

--- Comment #3 from Benjamin Priour <vultkayn at gcc dot gnu.org> ---

> I think the unordered containers might be too hard. I would start with
> std::vector, as that will probably give the best return on investment of
> effort.
> 

Indeed, I just found these slides from clang
https://llvm.org/devmtg/2018-04/slides/Balogh-Finding%20Iterator-related%20Errors%20with%20Clang%20Static%20Analyzer.pdf

They seem to be dividing the containers into 3 categories, list-like,
vector-like and deque-like, thus also sticking to ordered containers.

But they also are kind of brute-forcing the checks it seems, as they say upon
insertion or deletion, they check every iterator position of the container.

It sounds overly expensive to me, particularly for lists.
Surely there is a reason they are doing that, I'm looking into it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-08 19:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-21 18:00 [Bug analyzer/106392] New: Support iteration over C++ containers in -fanalyzer redi at gcc dot gnu.org
2023-06-08 17:08 ` [Bug analyzer/106392] " vultkayn at gcc dot gnu.org
2023-06-08 19:46 ` redi at gcc dot gnu.org
2023-06-08 19:57 ` vultkayn at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).