public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array
@ 2021-03-05 19:11 vanyacpp at gmail dot com
  2021-03-06 22:53 ` [Bug sanitizer/99418] " vanyacpp at gmail dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-05 19:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

            Bug ID: 99418
           Summary: sanitizer checks for accessing multidimentional
                    VLA-array
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: sanitizer
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vanyacpp at gmail dot com
                CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
                    jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org
  Target Milestone: ---

The example below accesses array past its size, but sanitizers don't show any
errors. If I change index m to m + 1 an error will be shown. This makes me
think that compiler does some checks, but perhaps they are incomplete for
multidimentional VLA-arrays.

GCC 10.2.

#include <string>

std::string shortest_match(size_t n, size_t m)
{
    std::string mas[n][m];
    mas[n - 1][m] = ""; // mas[n - 1][m + 1] will show an errors

    return mas[n - 1][m - 1];
}

int main()
{
    shortest_match(4, 3);
}

$ g++ -g -fsanitize=address,undefined -std=c++17 2.cpp && ./a.out 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==26974==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc
0x7f59ea2ad2d6 bp 0x000000000000 sp 0x7ffc78389ea0 T0)
==26974==The signal is caused by a WRITE memory access.
==26974==Hint: address points to the zero page.
    #0 0x7f59ea2ad2d6 in std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long,
unsigned long, char const*, unsigned long) (/lib/libstdc++.so.6+0x13c2d6)
    #1 0x401658 in shortest_match[abi:cxx11](unsigned long, unsigned long)
/home/ivan/2.cpp:6
    #2 0x4019eb in main /home/ivan/2.cpp:13
    #3 0x7f59e950ec7c in __libc_start_main (/lib/libc.so.6+0x23c7c)
    #4 0x4011a9 in _start (/home/ivan/a.out+0x4011a9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/libstdc++.so.6+0x13c2d6) in
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>::_M_replace(unsigned long, unsigned long, char const*, unsigned long)
==26974==ABORTING

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
@ 2021-03-06 22:53 ` vanyacpp at gmail dot com
  2021-03-07  7:44 ` vanyacpp at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-06 22:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #1 from Ivan Sorokin <vanyacpp at gmail dot com> ---
Here is the reduced example. It doesn't SIGSEGV, but it doesn't report any
sanitizer errors either:

$ g++ -g -fsanitize=bounds 3.cpp
$ cat 3.cpp
#include <cstddef>

void escape(int& a)
{}

void test(size_t n, size_t m)
{
    int mas[n][m];
    escape(mas[n - 1][m]);
}

int main()
{
    test(4, 3);
}

Surprisingly if I replace taking a reference with writing to the array it will
show an error.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
  2021-03-06 22:53 ` [Bug sanitizer/99418] " vanyacpp at gmail dot com
@ 2021-03-07  7:44 ` vanyacpp at gmail dot com
  2021-03-08  9:15 ` marxin at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-07  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #2 from Ivan Sorokin <vanyacpp at gmail dot com> ---
It looks like this is related to ignore_off_by_one parameter of
ubsan_instrument_bounds.

As can be seen in gimple the problematic .UBSAN_BOUNDS checks against array
size plus 1.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
  2021-03-06 22:53 ` [Bug sanitizer/99418] " vanyacpp at gmail dot com
  2021-03-07  7:44 ` vanyacpp at gmail dot com
@ 2021-03-08  9:15 ` marxin at gcc dot gnu.org
  2021-03-08 10:14 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-03-08  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-03-08
             Status|UNCONFIRMED                 |WAITING
     Ever confirmed|0                           |1

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
> Surprisingly if I replace taking a reference with writing to the array it
> will show an error.

Yes, ASAN instruments memory reads and writes, so your test-case reports ASAN
errors with:

void escape(int &a)
{
  a = 123;
}

$ g++ pr99418.C -fsanitize=address && ./a.out 
=================================================================
==7912==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address
0x7fffffffde90 at pc 0x000000400869 bp 0x7fffffffde10 sp 0x7fffffffde08
WRITE of size 4 at 0x7fffffffde90 thread T0
    #0 0x400868 in escape(int&)
(/home/marxin/Programming/testcases/a.out+0x400868)
    #1 0x4009c0 in test(unsigned long, unsigned long)
(/home/marxin/Programming/testcases/a.out+0x4009c0)
    #2 0x400a06 in main (/home/marxin/Programming/testcases/a.out+0x400a06)
    #3 0x7ffff708db24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #4 0x40077d in _start (/home/marxin/Programming/testcases/a.out+0x40077d)

Address 0x7fffffffde90 is located in stack of thread T0
SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow
(/home/marxin/Programming/testcases/a.out+0x400868) in escape(int&)
Shadow bytes around the buggy address:
  0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7bc0: 00 00 00 00 00 00 00 00 ca ca ca ca 00 00 00 00
=>0x10007fff7bd0: 00 00[cb]cb cb cb cb cb 00 00 00 00 00 00 00 00
  0x10007fff7be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==7912==ABORTING

That said, can we close it as resolved?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (2 preceding siblings ...)
  2021-03-08  9:15 ` marxin at gcc dot gnu.org
@ 2021-03-08 10:14 ` jakub at gcc dot gnu.org
  2021-03-08 18:23 ` msebor at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-08 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Asan can't by design detect neither #c0 nor #c1, only ubsan can.
The reason why ubsan has that off by one stuff is that in C/C++,
&mas[n - 1][m] is not undefined behavior, only mas[n - 1][m] is.
And with classes, it actually means calling some method with &mas[n - 1][m]
argument.
For #c1, the big question is what exactly is UB in C++, whether already binding
a reference to the object after the end of the array or only actually accessing
that reference.  If the former, ubsan could treat REFERENCE_TYPE differently,
if the latter, then I'm afraid it can't do that, and ubsan by design has to be
done early before all the optimizations change the IL so much that it is
completely lost what were the user errors in it.
For the method calls, there really isn't a reference in the IL either, this
argument is a pointer, but .UBSAN_BOUNDS calls are added in the FE and so
perhaps it could know it is a method call and treat it as a reference.
So, something can be done but we need answers on where the UB in C++ exactly
happens.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (3 preceding siblings ...)
  2021-03-08 10:14 ` jakub at gcc dot gnu.org
@ 2021-03-08 18:23 ` msebor at gcc dot gnu.org
  2021-03-09  8:39 ` vanyacpp at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: msebor at gcc dot gnu.org @ 2021-03-08 18:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |msebor at gcc dot gnu.org

--- Comment #5 from Martin Sebor <msebor at gcc dot gnu.org> ---
The initialization of a reference binds it to the value of the initializer
expression.  In both examples the [value of the] initializer expression is
undefined because it doesn't refer to an element of the array.  One way to see
that is in a constexpr context which rejects undefined constructs with an error
(only with Clang; GCC doesn't reject invalid initialization of references
there, see also pr70151):

$ cat t.C && clang -S t.C
constexpr int a[2] = { 1, 2 };
constexpr const int &r = a[2];
t.C:2:26: warning: array index 2 is past the end of the array (which contains 2
      elements) [-Warray-bounds]
constexpr const int &r = a[2];
                         ^ ~
t.C:1:1: note: array 'a' declared here
constexpr int a[2] = { 1, 2 };
^
t.C:2:22: error: constexpr variable 'r' must be initialized by a constant
      expression
constexpr const int &r = a[2];
                     ^   ~~~~
t.C:2:22: note: dereferenced pointer past the end of subobject of 'a' is not a
      constant expression
t.C:1:15: note: declared here
constexpr int a[2] = { 1, 2 };
              ^
1 warning and 1 error generated.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (4 preceding siblings ...)
  2021-03-08 18:23 ` msebor at gcc dot gnu.org
@ 2021-03-09  8:39 ` vanyacpp at gmail dot com
  2021-03-09  8:47 ` vanyacpp at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-09  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #6 from Ivan Sorokin <vanyacpp at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> Asan can't by design detect neither #c0 nor #c1, only ubsan can.
> The reason why ubsan has that off by one stuff is that in C/C++,
> &mas[n - 1][m] is not undefined behavior, only mas[n - 1][m] is.

That is very unfortunate. For standard containers subscripting with wrond
index is undefined behavior no matter if it is followed by taking of address.
I assumed the same rules apply for builtin arrays. If one need just a point
one can easily write a + n instead of &a[n]. Now I see that this is not the
case and built-in arrays behave differently.

> For #c1, the big question is what exactly is UB in C++, whether already
> binding a reference to the object after the end of the array or only
> actually accessing that reference.  If the former, ubsan could treat
> REFERENCE_TYPE differently, if the latter, then I'm afraid it can't do that,
> and ubsan by design has to be done early before all the optimizations change
> the IL so much that it is completely lost what were the user errors in it.
> For the method calls, there really isn't a reference in the IL either, this
> argument is a pointer, but .UBSAN_BOUNDS calls are added in the FE and so
> perhaps it could know it is a method call and treat it as a reference.
> So, something can be done but we need answers on where the UB in C++ exactly
> happens.

For -fsanitize=null the rules are quite subtle: dereferencing by itself (*p)
doesn't check for nullptr, but binding a reference (int& q = *p;) does.
Perhaps similar rules can be employed for past-the-end element: taking pointer
to it is fine, but passing the pointer as this parameter to function is UB? At
least this would be consistent with null pointers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (5 preceding siblings ...)
  2021-03-09  8:39 ` vanyacpp at gmail dot com
@ 2021-03-09  8:47 ` vanyacpp at gmail dot com
  2021-03-09  8:54 ` vanyacpp at gmail dot com
  2021-03-09 15:48 ` [Bug sanitizer/99418] more cases where -fsanitize=bounds can check one-past-the-end accesses msebor at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-09  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #7 from Ivan Sorokin <vanyacpp at gmail dot com> ---
(In reply to Martin Liška from comment #3)

> That said, can we close it as resolved?

I'm sorry for not being clear from the beginning. The original report was about
-fsanitize=bounds sanitizer which sometimes allows accessing one past the end
element.

Now after #c4 I see that language rules make it excessively complicated for
compiler to do this.

I believe that one past the end is important error to check for, but I
understand why compilers might choose to avoid doing it. Feel free to close the
issue if implementing it is infeasible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] sanitizer checks for accessing multidimentional VLA-array
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (6 preceding siblings ...)
  2021-03-09  8:47 ` vanyacpp at gmail dot com
@ 2021-03-09  8:54 ` vanyacpp at gmail dot com
  2021-03-09 15:48 ` [Bug sanitizer/99418] more cases where -fsanitize=bounds can check one-past-the-end accesses msebor at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: vanyacpp at gmail dot com @ 2021-03-09  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

--- Comment #8 from Ivan Sorokin <vanyacpp at gmail dot com> ---
If I understand #c5 correctly the minimal reproducer should be this:

void g(int&);

void f()
{
    int a[10];
    int& p = a[10]; // (1)
    g(a[10]);       // (2)
}

Both (1) and (2) are undefined and -fsanitize=bounds can help checking this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug sanitizer/99418] more cases where -fsanitize=bounds can check one-past-the-end accesses
  2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
                   ` (7 preceding siblings ...)
  2021-03-09  8:54 ` vanyacpp at gmail dot com
@ 2021-03-09 15:48 ` msebor at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: msebor at gcc dot gnu.org @ 2021-03-09 15:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99418

Martin Sebor <msebor at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |56456
             Status|WAITING                     |NEW
           Keywords|                            |diagnostic

--- Comment #9 from Martin Sebor <msebor at gcc dot gnu.org> ---
I don't know enough about the sanitizer to judge how difficult it might be to
handle this case but a patch I posted in November
(https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558775.html) enhances
-Warray-bounds to diagnose the test cases in comment #8:

$ g++ -O2 -S -Wall pr99418-c8.C
pr99418-c8.C: In function ‘void f()’:
pr99418-c8.C:7:10: warning: unused variable ‘p’ [-Wunused-variable]
    7 |     int& p = a[10]; // (1)
      |          ^
pr99418-c8.C:8:6: warning: subscript 10 in argument 1 to ‘void g(int&)’ is just
past the end of ‘int [10]’ [-Warray-bounds]
    8 |     g(a[10]);       // (2)
      |     ~^~~~~~~
pr99418-c8.C:6:9: note: at offset 40 into source object ‘a’ of size 40
    6 |     int a[10];
      |         ^

The enhancement doesn't handle multidimensional VLAs like those in comment #1
but it looks like the IL has enough information to make the detection possible:

int main ()
{
  int[0:D.2590][0:D.2587] * mas.4;
  int * _4;

  <bb 2> [local count: 1073741824]:
  mas.4_3 = __builtin_alloca_with_align (48, 32);
  _4 = &MEM <int[0:D.2590][0:D.2587]> [(int[0:D.2569][0:D.2565]
*)mas.4_3][3]{lb: 0 sz: 12}[3];
  escape (_4);
  return 0;

}

With that let me confirm this request for both -Warray-bounds (I'll resubmit
the patch for GCC 12) and for the sanitizer.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
[Bug 56456] [meta-bug] bogus/missing -Warray-bounds

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-09 15:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05 19:11 [Bug sanitizer/99418] New: sanitizer checks for accessing multidimentional VLA-array vanyacpp at gmail dot com
2021-03-06 22:53 ` [Bug sanitizer/99418] " vanyacpp at gmail dot com
2021-03-07  7:44 ` vanyacpp at gmail dot com
2021-03-08  9:15 ` marxin at gcc dot gnu.org
2021-03-08 10:14 ` jakub at gcc dot gnu.org
2021-03-08 18:23 ` msebor at gcc dot gnu.org
2021-03-09  8:39 ` vanyacpp at gmail dot com
2021-03-09  8:47 ` vanyacpp at gmail dot com
2021-03-09  8:54 ` vanyacpp at gmail dot com
2021-03-09 15:48 ` [Bug sanitizer/99418] more cases where -fsanitize=bounds can check one-past-the-end accesses msebor at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).