public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair
@ 2021-10-04 12:54 dean.gcc.bugzilla at gmail dot com
  2021-10-04 13:22 ` [Bug libstdc++/102592] [11/12 Regression] " redi at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: dean.gcc.bugzilla at gmail dot com @ 2021-10-04 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

            Bug ID: 102592
           Summary: heap-use-after-free when constructing
                    std::filesystem::path from iterator pair
           Product: gcc
           Version: 11.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dean.gcc.bugzilla at gmail dot com
  Target Milestone: ---

The std::filesystem::path constructor that takes an iterator pair produces a
heap-use-after-free error (address sanitizer) in case the iterators are
non-contiguous and value_type != char.

Full self-contained reproducible example on Compiler Explorer:
https://godbolt.org/z/3nxqq5Ef8 I've also included the reproduction code and
error below for completeness.

Affected versions: 11.x, trunk
Not affected: 10.x

As far as I've been able to track this in the std::filesystem::path
constructor, for the case of non-contiguous, non-char iterators, a temporary
container is created and then assigned to a std::string_view. However that temp
container has a shorter lifetime than the std::string_view.

The code:

#include <filesystem>

namespace fs = std::filesystem;

fs::path to_path(std::string_view sv) {
    struct AsU8 {
        std::string_view::const_iterator it;

        using iterator_category = std::forward_iterator_tag;
        using difference_type = std::ptrdiff_t;
        using value_type = char8_t;
        using reference = const value_type&;
        using pointer = const value_type*;

        char8_t operator*() const { return static_cast<char8_t>(*it); }
        AsU8& operator++() { return ++it, *this; }
        bool operator==(const AsU8&) const = default;
    };
    return {AsU8{sv.begin()}, AsU8{sv.end()}};
}

int main() {
    return
to_path("/long/path/to/a/file/to/avoid/small/string/opt").string().size();
}

=1==ERROR: AddressSanitizer: heap-use-after-free on address 0x604000000010 at
pc 0x7fed85f999c0 bp 0x7ffdddc9ccc0 sp 0x7ffdddc9c470
READ of size 46 at 0x604000000010 thread T0
    #0 0x7fed85f999bf in __interceptor_memcpy
(/opt/compiler-explorer/gcc-11.2.0/lib64/libasan.so.6+0x3c9bf)
    #1 0x4039e5 in std::char_traits<char>::copy(char*, char const*, unsigned
long)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/char_traits.h:409
    #2 0x405214 in std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::_S_copy(char*, char const*, unsigned long)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:359
    #3 0x4058f1 in std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::_S_copy_chars(char*, char const*, char const*)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:406
    #4 0x405109 in void std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char
const*, char const*, std::forward_iterator_tag)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.tcc:225
    #5 0x40578b in void std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char
const*>(char const*, char const*, std::__false_type)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:255
    #6 0x404a8e in void std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char
const*, char const*)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:274
    #7 0x404119 in std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string(char const*, unsigned long,
std::allocator<char> const&)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:521
    #8 0x404e66 in std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::__sv_wrapper,
std::allocator<char> const&)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:154
    #9 0x404373 in std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string<std::basic_string_view<char,
std::char_traits<char> >, void>(std::basic_string_view<char,
std::char_traits<char> > const&, std::allocator<char> const&)
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/basic_string.h:664
    #10 0x402c63 in path<to_path(std::string_view)::AsU8>
/opt/compiler-explorer/gcc-11.2.0/include/c++/11.2.0/bits/fs_path.h:292
    #11 0x4024ae in to_path[abi:cxx11](std::basic_string_view<char,
std::char_traits<char> >) /app/example.cpp:19
    #12 0x40265c in main /app/example.cpp:23
    #13 0x7fed85a140b2 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #14 0x40226d in _start (/app/output.s+0x40226d)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
@ 2021-10-04 13:22 ` redi at gcc dot gnu.org
  2021-10-04 13:27 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-04 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|heap-use-after-free when    |[11/12 Regression]
                   |constructing                |heap-use-after-free when
                   |std::filesystem::path from  |constructing
                   |iterator pair               |std::filesystem::path from
                   |                            |iterator pair
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org
           Keywords|                            |wrong-code
   Last reconfirmed|                            |2021-10-04
      Known to work|                            |10.3.1
             Status|UNCONFIRMED                 |ASSIGNED
      Known to fail|                            |11.2.1, 12.0

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The problem is the last line of this function:

    template<typename _Tp>
      static auto
      _S_convert(const _Tp& __str)
      {
        if constexpr (is_same_v<_Tp, string_type>)
          return __str;
        else if constexpr (is_same_v<_Tp, basic_string_view<value_type>>)
          return __str;
        else if constexpr (is_same_v<typename _Tp::value_type, value_type>)
          return basic_string_view<value_type>(__str.data(), __str.size());
        else
          return _S_convert(__str.data(), __str.data() + __str.size());
      }

That returns a basic_string_view<char8_t> referring to the contents of __str,
but that is an rvalue basic_string<char8_t> that goes out of scope before the
result is used.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
  2021-10-04 13:22 ` [Bug libstdc++/102592] [11/12 Regression] " redi at gcc dot gnu.org
@ 2021-10-04 13:27 ` redi at gcc dot gnu.org
  2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-04 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
   Target Milestone|---                         |11.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
  2021-10-04 13:22 ` [Bug libstdc++/102592] [11/12 Regression] " redi at gcc dot gnu.org
  2021-10-04 13:27 ` redi at gcc dot gnu.org
@ 2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
  2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-13 19:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:85b24e32dc27ec2e70b853713e0713cbc1ff08c3

commit r12-4380-g85b24e32dc27ec2e70b853713e0713cbc1ff08c3
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 13 17:02:59 2021 +0100

    libstdc++: Fix dangling string_view in filesystem::path [PR102592]

    When creating a path from a pair of non-contiguous iterators we pass the
    iterators to _S_convert(Iter, Iter). That function passes the iterators
    to __string_from_range to get a contiguous sequence of characters, and
    then calls _S_convert(const C*, const C*) to perform the encoding
    conversions. If the value type, C, is char8_t, then no conversion is
    needed and the _S_convert<char8_t>(const char8_t*, const char8_t*)
    specialization casts the pointer to const char* and returns a
    std::string_view that refs to the char8_t sequence. However, that
    sequence is owned by the std::u8string rvalue returned by
    __string_from_range, which goes out of scope when _S_convert(Iter, Iter)
    returns. That means the std::string_view is dangling and we get
    undefined behaviour when parsing it as a path.

    The same problem does not exist for the path members taking a "Source"
    argument, because those functions all convert a non-contiguous range
    into a basic_string<C> immediately, using __effective_range(__source).
    That means that the rvalue string returned by that function is still in
    scope for the full expression, so the string_view does not dangle.

    The solution for the buggy functions is to do the same thing, and call
    __string_from_range immediately, so that the returned rvalue is still in
    scope for the lifetime of the string_view returned by _S_convert. To
    avoid reintroducing the same problem, remove the _S_convert(Iter, Iter)
    overload that calls __string_from_range and returns a dangling view.

    libstdc++-v3/ChangeLog:

            PR libstdc++/102592
            * include/bits/fs_path.h (path::path(Iter, Iter, format))
            (path::append(Iter, Iter), path::concat(Iter, Iter)): Call
            __string_from_range directly, instead of two-argument overload
            of _S_convert.
            (path::_S_convert(Iter, Iter)): Remove.
            * testsuite/27_io/filesystem/path/construct/102592.C: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
                   ` (2 preceding siblings ...)
  2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
@ 2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
  2021-10-13 20:54 ` cvs-commit at gcc dot gnu.org
  2021-10-13 20:55 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-13 19:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:b83b810ac440f72e7551b6496539e60ac30c0d8a

commit r12-4381-gb83b810ac440f72e7551b6496539e60ac30c0d8a
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 13 17:19:57 2021 +0100

    libstdc++: Refactor filesystem::path encoding conversions

    Adjust the __detail::__effective_range overloads so they always return a
    string or string view using std::char_traits, because we don't care
    about the traits of an incoming string.

    Use std::contiguous_iterator in the __effective_range(const Source&)
    overload, to allow returning a basic_string_view in more cases. For the
    non-contiguous casecl in both __effective_range and __string_from_range,
    return a std::string instead of std::u8string when the value type of the
    range is char8_t.  These changes avoid unnecessary basic_string
    temporaries.

    Also simplify __string_from_range(Iter, Iter) to not need
    std::__to_address for the contiguous case.

    Combine the _S_convert(string_type) and _S_convert(const T&) overloads
    into a single _S_convert(T) function which also avoids the dangling
    view problem of PR 102592 (should that recur somehow).

    libstdc++-v3/ChangeLog:

            * include/bits/fs_path.h (__detail::__is_contiguous): New
            variable template to identify contiguous iterators.
            (__detail::__unified_char8_t): New alias template to decide when
            to treat char8_t as char without encoding conversion.
            (__detail::__effective_range(const basic_string<C,T>&)): Use
            std::char_traits<C> for returned string view.
            (__detail::__effective_range(const basic_string_view<C,T>&)):
            Likewise.
            (__detail::__effective_range(const Source&)): Use
            __is_contiguous to detect mode cases of contiguous iterators.
            Use __unified_char8_t to return a std::string instead of
            std::u8string.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
                   ` (3 preceding siblings ...)
  2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
@ 2021-10-13 20:54 ` cvs-commit at gcc dot gnu.org
  2021-10-13 20:55 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-13 20:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:9ef31bab15a426a6829acf0e7ae5420d3b10cbae

commit r11-9149-g9ef31bab15a426a6829acf0e7ae5420d3b10cbae
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 13 17:02:59 2021 +0100

    libstdc++: Fix dangling string_view in filesystem::path [PR102592]

    When creating a path from a pair of non-contiguous iterators we pass the
    iterators to _S_convert(Iter, Iter). That function passes the iterators
    to __string_from_range to get a contiguous sequence of characters, and
    then calls _S_convert(const C*, const C*) to perform the encoding
    conversions. If the value type, C, is char8_t, then no conversion is
    needed and the _S_convert<char8_t>(const char8_t*, const char8_t*)
    specialization casts the pointer to const char* and returns a
    std::string_view that refs to the char8_t sequence. However, that
    sequence is owned by the std::u8string rvalue returned by
    __string_from_range, which goes out of scope when _S_convert(Iter, Iter)
    returns. That means the std::string_view is dangling and we get
    undefined behaviour when parsing it as a path.

    The same problem does not exist for the path members taking a "Source"
    argument, because those functions all convert a non-contiguous range
    into a basic_string<C> immediately, using __effective_range(__source).
    That means that the rvalue string returned by that function is still in
    scope for the full expression, so the string_view does not dangle.

    The solution for the buggy functions is to do the same thing, and call
    __string_from_range immediately, so that the returned rvalue is still in
    scope for the lifetime of the string_view returned by _S_convert. To
    avoid reintroducing the same problem, remove the _S_convert(Iter, Iter)
    overload that calls __string_from_range and returns a dangling view.

    libstdc++-v3/ChangeLog:

            PR libstdc++/102592
            * include/bits/fs_path.h (path::path(Iter, Iter, format))
            (path::append(Iter, Iter), path::concat(Iter, Iter)): Call
            __string_from_range directly, instead of two-argument overload
            of _S_convert.
            (path::_S_convert(Iter, Iter)): Remove.
            * testsuite/27_io/filesystem/path/construct/102592.C: New test.

    (cherry picked from commit 85b24e32dc27ec2e70b853713e0713cbc1ff08c3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102592] [11/12 Regression] heap-use-after-free when constructing std::filesystem::path from iterator pair
  2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
                   ` (4 preceding siblings ...)
  2021-10-13 20:54 ` cvs-commit at gcc dot gnu.org
@ 2021-10-13 20:55 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-13 20:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102592

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Fixed for 11.3, thanks for the report.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-13 20:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 12:54 [Bug libstdc++/102592] New: heap-use-after-free when constructing std::filesystem::path from iterator pair dean.gcc.bugzilla at gmail dot com
2021-10-04 13:22 ` [Bug libstdc++/102592] [11/12 Regression] " redi at gcc dot gnu.org
2021-10-04 13:27 ` redi at gcc dot gnu.org
2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
2021-10-13 19:39 ` cvs-commit at gcc dot gnu.org
2021-10-13 20:54 ` cvs-commit at gcc dot gnu.org
2021-10-13 20:55 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).