public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
@ 2021-01-14 12:04 egor_suvorov at mail dot ru
  2021-01-14 14:17 ` [Bug libstdc++/98677] " redi at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: egor_suvorov at mail dot ru @ 2021-01-14 12:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

            Bug ID: 98677
           Summary: std::regex constructor triggers valgrind under clang++
                    with undefined sanitizer; possible use-after-move
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: egor_suvorov at mail dot ru
  Target Milestone: ---

Consider the following code:

#include <regex>
int main() {
    std::regex regex("x{2,}");
}

If I compile and run it at Ubuntu 20.04 with

clang++-10 -fsanitize=undefined -O2 -g a.cpp && valgrind ./a.out 

I get the following error:

==2367== Memcheck, a memory error detector
==2367== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2367== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2367== Command: ./a.out
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C:
std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone()
(regex_automaton.tcc:208)
==2367==    by 0x4341EA:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier()
(regex_compiler.tcc:253)
==2367==    by 0x432F67:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()
(regex_compiler.tcc:143)
==2367==    by 0x432B9A:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(regex_compiler.tcc:123)
==2367==    by 0x427E00:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(regex_compiler.tcc:99)
==2367==    by 0x42747E:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const
char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char
const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C:
std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone()
(regex_automaton.tcc:208)
==2367==    by 0x434218:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier()
(regex_compiler.tcc:257)
==2367==    by 0x432F67:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()
(regex_compiler.tcc:143)
==2367==    by 0x432B9A:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(regex_compiler.tcc:123)
==2367==    by 0x427E00:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(regex_compiler.tcc:99)
==2367==    by 0x42747E:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const
char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char
const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== 
==2367== HEAP SUMMARY:
==2367==     in use at exit: 0 bytes in 0 blocks
==2367==   total heap usage: 20 allocs, 20 frees, 76,776 bytes allocated
==2367== 
==2367== All heap blocks were freed -- no leaks are possible
==2367== 
==2367== Use --track-origins=yes to see where uninitialised values come from
==2367== For lists of detected and suppressed errors, rerun with: -s
==2367== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

Any of the following actions remove the error: replacing clang++ with g++,
disabling -fsanitize=undefined, disabling -O2, switching to -stdlib=libc++.

Versions are:

clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

valgrind-3.15.0

libstdc++-10-dev/focal-updates,focal-security,now 10.2.0-5ubuntu1~20.04 amd64
[installed,automatic]

A friend of mine suggested that it's probably caused by use-after-move of
`__dup` in regex_automaton.tcc:206 (commit
e45c41988bfd655b1df7cff8fcf111dc6fb732e3 at GitHub mirror) and vaguely
suggested that maybe clang++ starts to implement some kind of destructive
moves:

          auto __id = _M_nfa._M_insert_state(std::move(__dup));
          __m[__u] = __id;
          if (__dup._M_has_alt())

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
@ 2021-01-14 14:17 ` redi at gcc dot gnu.org
  2021-01-14 14:21 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-01-14 14:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|11.0                        |10.2.0

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Egor Suvorov from comment #0)
> suggested that maybe clang++ starts to implement some kind of destructive
> moves:

That would not be permitted by the C++ standard.

I can't reproduce this with clang 10.0.1 and valgrind 3.16.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
  2021-01-14 14:17 ` [Bug libstdc++/98677] " redi at gcc dot gnu.org
@ 2021-01-14 14:21 ` redi at gcc dot gnu.org
  2021-10-09  0:46 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-01-14 14:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I don't think the use-after-move is the problem, because only the members of
the _State_base class are accessed after the move, and they are unaffected by
the move.

The move constructor does this:

      _State(_State&& __rhs) : _State_base(__rhs)
      {
        if (__rhs._M_opcode() == _S_opcode_match)
          new (this->_M_matcher_storage._M_addr())
            _MatcherT(std::move(__rhs._M_get_matcher()));
      }

So the _State_base::_M_opcode, _State_base::_M_alt and _State_base::_M_next
members of __rhs are not modified, and can be safely accessed after the move.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
  2021-01-14 14:17 ` [Bug libstdc++/98677] " redi at gcc dot gnu.org
  2021-01-14 14:21 ` redi at gcc dot gnu.org
@ 2021-10-09  0:46 ` redi at gcc dot gnu.org
  2021-10-09 12:46 ` egor_suvorov at mail dot ru
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-09  0:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WORKSFORME
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I can't reproduce this error with any version of clang or gcc. No error is
shown by ubsan, asan or valgrind.

Please reopen if you can still reproduce this, and provide more details about
versions of the tools.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
                   ` (2 preceding siblings ...)
  2021-10-09  0:46 ` redi at gcc dot gnu.org
@ 2021-10-09 12:46 ` egor_suvorov at mail dot ru
  2021-10-09 14:46 ` redi at gcc dot gnu.org
  2021-10-12 20:01 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: egor_suvorov at mail dot ru @ 2021-10-09 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

--- Comment #4 from Egor Suvorov <egor_suvorov at mail dot ru> ---
Note that the issue requires both UBSanitizer and Valgrind enabled
simultaneously. Running with just one of them does not result in an error.

The versions are ones installed in Ubuntu 20.04.3 Focal Fossa (amd64) by
running `sudo apt install clang-10 valgrind` (you can get VM image from
https://www.osboxes.org/ubuntu-server/)

https://launchpad.net/ubuntu/focal/amd64/clang-10/1:10.0.0-4ubuntu1
https://launchpad.net/ubuntu/focal/amd64/valgrind/1:3.15.0-1ubuntu9.1

Unfortunately, I was unable to get more verbose output from either clang or
Valgrind. So it may be Ubuntu-specific issue.

Actually, the problem is not reproducible with clang-12, so it may be already
be fixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
                   ` (3 preceding siblings ...)
  2021-10-09 12:46 ` egor_suvorov at mail dot ru
@ 2021-10-09 14:46 ` redi at gcc dot gnu.org
  2021-10-12 20:01 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-09 14:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Yes, I used both ubsan and valgrind, and couldn't reproduce it with any version
of GCC or clang I tried, including with the headers from GCC 10.2

I might try to reproduce it on Ubuntu, but if it's specific to Ubuntu then it
should be reported to them, not here. Upstream GCC doesn't reproduce the
problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/98677] std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move
  2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
                   ` (4 preceding siblings ...)
  2021-10-09 14:46 ` redi at gcc dot gnu.org
@ 2021-10-12 20:01 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-12 20:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I can reproduce it in a container using:

FROM ubuntu:20.04
RUN apt-get update
RUN echo 'tzdata tzdata/Areas select Europe' | debconf-set-selections
RUN echo 'tzdata tzdata/Zones/Europe select London' | debconf-set-selections
RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata
RUN apt-get -y install build-essential clang-10 valgrind
COPY test.cc /tmp/
RUN clang++-10 -fsanitize=undefined -O1 -g /tmp/test.cc -o /tmp/a.out &&
valgrind /tmp/a.out

But it's not caused by the move (it still happens if I copy the state object
instead of moving it). Initializing the _State_base::_M_match_storage buffer
doesn't help. Nothing I changed in the code helped.

I think it's a ubsan bug in clang 10 and clang 11.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-12 20:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 12:04 [Bug libstdc++/98677] New: std::regex constructor triggers valgrind under clang++ with undefined sanitizer; possible use-after-move egor_suvorov at mail dot ru
2021-01-14 14:17 ` [Bug libstdc++/98677] " redi at gcc dot gnu.org
2021-01-14 14:21 ` redi at gcc dot gnu.org
2021-10-09  0:46 ` redi at gcc dot gnu.org
2021-10-09 12:46 ` egor_suvorov at mail dot ru
2021-10-09 14:46 ` redi at gcc dot gnu.org
2021-10-12 20:01 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).