public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug sanitizer/99814] New: regexec fails with -fsanitize=address
@ 2021-03-29  9:44 stefansf at linux dot ibm.com
  2021-03-30  7:36 ` [Bug sanitizer/99814] " marxin at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: stefansf at linux dot ibm.com @ 2021-03-29  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

            Bug ID: 99814
           Summary: regexec fails with -fsanitize=address
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: sanitizer
          Assignee: unassigned at gcc dot gnu.org
          Reporter: stefansf at linux dot ibm.com
                CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
                    jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org
  Target Milestone: ---
            Target: s390x

Testing against today's commit

https://gcc.gnu.org/g:d579e2e76f9469e1b386d693af57c5c4f0ede410

on s390x we have:

$ gcc pr98920.c -fsanitize=address && ./a.out
failed to match

The testcase succeeds without `-fsanitize=address`.

In GDB I see that the address loaded from _ZN14__interception12real_regexecE
equals the address of regexec@GLIBC_2.2 which explains why the testcase fails. 
Without `-fsanitize=address` function regexec@@GLIBC_2.3.4 is executed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
@ 2021-03-30  7:36 ` marxin at gcc dot gnu.org
  2021-03-30  8:39 ` stefansf at linux dot ibm.com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-03-30  7:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-03-30

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Thanks for the report. Hm, it's strange as we should request exactly this
version of the symbol through the following code path:

  COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(regexec, "GLIBC_2.3.4");            
\

#ifdef __GLIBC__
// If we could not find the versioned symbol, fall back to an unversioned
// lookup. This is needed to work around a GLibc bug that causes dlsym
// with RTLD_NEXT to return the oldest versioned symbol.
// See https://sourceware.org/bugzilla/show_bug.cgi?id=14932.
// For certain symbols (e.g. regexec) we have to perform a versioned lookup,
// but that versioned symbol will only exist for architectures where the
// oldest Glibc version pre-dates support for that architecture.
// For example, regexec@GLIBC_2.3.4 exists on x86_64, but not RISC-V.
// See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920.
#define COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(fn, ver) \
  COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK(fn, ver)
#else
#define COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(fn, ver) \
  COMMON_INTERCEPT_FUNCTION(fn)
#endif

#define ASAN_INTERCEPT_FUNC_VER_UNVERSIONED_FALLBACK(name, ver)              \
  do {                                                                       \
    if (!INTERCEPT_FUNCTION_VER(name, ver) && !INTERCEPT_FUNCTION(name))     \
      VReport(1, "AddressSanitizer: failed to intercept '%s@@%s' or '%s'\n", \
              #name, #ver, #name);                                           \
  } while (0)


Can you please debug if INTERCEPT_FUNCTION_VER really fails?
I'm sorry but I don't have a handy s390 machine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
  2021-03-30  7:36 ` [Bug sanitizer/99814] " marxin at gcc dot gnu.org
@ 2021-03-30  8:39 ` stefansf at linux dot ibm.com
  2021-03-30  8:58 ` marxin at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: stefansf at linux dot ibm.com @ 2021-03-30  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #2 from Stefan Schulze Frielinghaus <stefansf at linux dot ibm.com> ---
Breakpoint 4, __interception::InterceptFunction (name=0x3fffd61e8f2 "regexec",
ver=0x3fffd61eb7e "GLIBC_2.3.4", ptr_to_real=0x3fffd677d08
<__interception::real_regexec>, func=16779728, 
    wrapper=4398001883504) at
/devel/gcc-4/src/libsanitizer/interception/interception_linux.cpp:74
74        void *addr = GetFuncAddr(name, ver);

At the end of InterceptFunction we have:

(gdb) print addr
$1 = (void *) 0x3fffd2e9110 <__GI___regexec>

The address itself also LGTM, i.e., `readelf -s /lib64/libc.so.6 | grep
regexec` results in:
   279: 00000000000e9110   344 FUNC    GLOBAL DEFAULT   13 regexec@@GLIBC_2.3.4
...
 25156: 00000000000e9110   344 FUNC    LOCAL  DEFAULT   13 __GI___regexec

However, variables func and wrapper differ 

(gdb) print func
$2 = 16779728
(gdb) print wrapper
$3 = 4398001883504

so we return false.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
  2021-03-30  7:36 ` [Bug sanitizer/99814] " marxin at gcc dot gnu.org
  2021-03-30  8:39 ` stefansf at linux dot ibm.com
@ 2021-03-30  8:58 ` marxin at gcc dot gnu.org
  2021-03-30 11:33 ` stefansf at linux dot ibm.com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-03-30  8:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Strange, please report it to upstream:
https://github.com/google/sanitizers/issues

and CC people from https://reviews.llvm.org/D96348

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
                   ` (2 preceding siblings ...)
  2021-03-30  8:58 ` marxin at gcc dot gnu.org
@ 2021-03-30 11:33 ` stefansf at linux dot ibm.com
  2021-03-30 12:06 ` Alexander.Richardson at cl dot cam.ac.uk
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: stefansf at linux dot ibm.com @ 2021-03-30 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #4 from Stefan Schulze Frielinghaus <stefansf at linux dot ibm.com> ---
Thanks for the pointers!  I reported it upstream in issue
[1390](https://github.com/google/sanitizers/issues/1390)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
                   ` (3 preceding siblings ...)
  2021-03-30 11:33 ` stefansf at linux dot ibm.com
@ 2021-03-30 12:06 ` Alexander.Richardson at cl dot cam.ac.uk
  2021-03-30 12:09 ` marxin at gcc dot gnu.org
  2021-03-30 12:39 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: Alexander.Richardson at cl dot cam.ac.uk @ 2021-03-30 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

Alex Richardson <Alexander.Richardson at cl dot cam.ac.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Alexander.Richardson at cl dot cam
                   |                            |.ac.uk

--- Comment #5 from Alex Richardson <Alexander.Richardson at cl dot cam.ac.uk> ---
Does the sanitizer runtime library include the https://reviews.llvm.org/D96348
patch?

IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest versioned
symbol. Not sure why that behaviour was chosen.
I'm sure there are lots of other sanitizer interceptors that are also affected
by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
                   ` (4 preceding siblings ...)
  2021-03-30 12:06 ` Alexander.Richardson at cl dot cam.ac.uk
@ 2021-03-30 12:09 ` marxin at gcc dot gnu.org
  2021-03-30 12:39 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-03-30 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Alex Richardson from comment #5)
> Does the sanitizer runtime library include the
> https://reviews.llvm.org/D96348 patch?

Yes, the change was merged into GCC master some time ago.

> 
> IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest
> versioned symbol. Not sure why that behaviour was chosen.
> I'm sure there are lots of other sanitizer interceptors that are also
> affected by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

Shouldn't dlvsym return the only one symbol in this case? Can't we rely on
that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug sanitizer/99814] regexec fails with -fsanitize=address
  2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
                   ` (5 preceding siblings ...)
  2021-03-30 12:09 ` marxin at gcc dot gnu.org
@ 2021-03-30 12:39 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-30 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Alex Richardson from comment #5)
> Does the sanitizer runtime library include the
> https://reviews.llvm.org/D96348 patch?
> 
> IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest
> versioned symbol. Not sure why that behaviour was chosen.
> I'm sure there are lots of other sanitizer interceptors that are also
> affected by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

dlsym behavior matches the behavior of normal symbol lookup resolution.
When glibc (or some other libraries) started, it was unversioned and later
symbol versions were added to it.  When linking against the very old glibc,
libraries or binaries would use unversioned symbols and so that for ABI
compatibility naturally needs to be resolved against the oldest symbol version.
 Libraries/binaries linked against newer glibc versions then have versioned
symbols and use both the symbol name and symbol version in symbol lookup (i.e.
as dlvsym).
For dlsym, one doesn't really know in which era the library or binary has been
linked against and what it expects, it could be very old binary or newer or
most recent, and if the same symbol has multiple symbol versions, which one to
choose is unknown.  So, for symbols with more than one symbol version one
should use dlvsym instead of dlsym.
Ideally, libsanitizer shared libraries would be symbol versioned, for its own
APIs with some sanitizer specific symbol version(s), for the symbols it
intercepts from glibc with the symbol versions from glibc it was configured
against, and for symbols with multiple symbol versions one should have multiple
interceptors, which if they call the intercepted function should use dlvsym.
That would mean at library configure time scaning glibc symbol versions and
deciding on the *san version scripts and predefined macros based on that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-30 12:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29  9:44 [Bug sanitizer/99814] New: regexec fails with -fsanitize=address stefansf at linux dot ibm.com
2021-03-30  7:36 ` [Bug sanitizer/99814] " marxin at gcc dot gnu.org
2021-03-30  8:39 ` stefansf at linux dot ibm.com
2021-03-30  8:58 ` marxin at gcc dot gnu.org
2021-03-30 11:33 ` stefansf at linux dot ibm.com
2021-03-30 12:06 ` Alexander.Richardson at cl dot cam.ac.uk
2021-03-30 12:09 ` marxin at gcc dot gnu.org
2021-03-30 12:39 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).