public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* array bounds, sanitizer, safe programming, and cilk array notation
@ 2015-01-26 19:54 Martin Uecker
  2015-01-27  0:08 ` Joseph Myers
  2015-02-21 15:21 ` Marek Polacek
  0 siblings, 2 replies; 7+ messages in thread
From: Martin Uecker @ 2015-01-26 19:54 UTC (permalink / raw)
  To: gcc Mailing List
  Cc: Jeff Law, Joseph Myers, Jakub Jelinek, Marek Polacek,
	Florian Weimer, Balaji V. Iyer


Hi all,

I am writing numerical code, so I am trying to make the use 
of arrays in C (with gcc) suck a bit less. In general, the long term
goal would be to have either a compile-time warning or the possibility
to get a run-time error if one writes beyond the end of an array as 
specified by its type.

So one example (see below) I looked at is where I pass an array of
too small size to a function, to see how see can be diagnosed. In some
cases, we can get a runtime error with the address sanitizer, but this
is fairly limited (e.g. it does not work when the array is embedded
in a struct) and I also got mixed results when the function
is inlined.

For pointers to arrays with static size one can get an "incompatible
pointer" warning - which is nice. With clang, I also get warning for 
pointers which are declared as array parameters and use the 'static' 
keyword to specify a minimum size. This a diagnostic we are currently
missing.

The next step would be to have diagnostics also for the VLA
case if the size is known at compilation time (as in the
example below) and a run-time error when it is not (maybe 
with the undefined behaviour sanitizer?).

If we have the later, I think this might also help with safer 
programming in C, because one would get either a compile time or 
runtime error when I passing a buffer which is too small to
a function. For example, snprintf could have a prototype like
this:

int snprintf(size_t size; char str[static size], size_t size, 
  const char *format, ...);

That VLAs essentially provide the bounded pointer type C is
missing has been pointed out before, e.g. there was a proposal
by John Nagle, although he proposed rather intrusive language
changes (e.g. adding references to C) which are not necessary
in my opinion:

https://gcc.gnu.org/ml/gcc/2012-08/msg00360.html


Finally, what is missing is a way to diagnose problems inside
the called functions. -Warray-bounds=2 (with my recently
accepted patch) helps with this, but - so far - only for static 
arrays:

void foo(int (*x)[4])
{
	(*x)[4] = 5;	// warning
}

It would be nice to also have these warning and runtime errors
(with the undefined behaviour sanitizer) for VLAs. 

Finally, I think we should have corresponding warning also
for pointers which are declared as array parameters:

void foo2(int x[4])
{
	x[4] = 5;
}

The later does not currently produce a warning, because x
is converted to a pointer and the length is ignored. 

If it is not possible to have warning here for compatibility
reasons, one possibility is to have an extension similar to 
'static' which makes 'x' a proper array in the callee, e.g. 
something like:

void foo2(int x[array 4])
{
	// x is now of type int[4] and not int*

	x[4] = 5; // error
}

The semantics would be that the array is still passed as
a pointer but the type of x would be int[4]. Because it
immediately decays into a pointer when used, no code generation
changes would be required (except maybe when looking at the
type with sizeof and _Generic).

Another reason I like this is because Cilk array notation
currently requires the length to be specified for 'x' because
it is a pointer and not an array. If x would be an array,
something like this would work:

void foo2(int x[array 4])
{
	x[:] = 1;
}

In fact, the documentation for Cilk as such examples
(without the array keyword), and I guess this works on the
intel compiler but not on gcc.

I am willing to spend some (limited) time on all of this, but
I thought I ask for comments first. I appreciate any feedback,
suggestions, and help!

Cheers,
Martin




// file 1
extern void bar(int x[static 5])
{
	for (int i = 0; i < 5; i++)
		x[i] = 1;
}

extern void bar2(int (*x)[5])
{
	for (int i = 0; i < 5; i++)
		(*x)[i] = 1;
}

// file 2

#include <stdio.h>



extern void bar(int x[static 5]);
extern void bar2(int (*x)[5]);




int main()
{
	int x[4] = { 0 };
	bar(x);		// warning only with clang (found by asan)
	bar2(&x);	// warning (found by asan)
	
	int c = 4;
	int y[c];
	
	for (int i = 0; i < c; i++)
		y[i] = 0;

	bar(y);		// not diagnosed (found by asan)
	bar2(&y);	// not diagnosed (found by asan)

	struct foo {
		int z[4];
		int bar;
	} zz = { { 0 }, 0 };

	bar(zz.z);	// warning only with clang
	bar2(&zz.z);	// warning

	printf("%d %d %d %d\n", x[0], x[1], x[2], x[3]);
	printf("%d %d %d %d\n", y[0], y[1], y[2], y[3]);
	printf("%d %d %d %d\n", zz.z[0], zz.z[1], zz.z[2], zz.z[3]);
}





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: array bounds, sanitizer, safe programming, and cilk array notation
  2015-01-26 19:54 array bounds, sanitizer, safe programming, and cilk array notation Martin Uecker
@ 2015-01-27  0:08 ` Joseph Myers
  2015-02-21 15:41   ` Marek Polacek
  2015-02-21 15:21 ` Marek Polacek
  1 sibling, 1 reply; 7+ messages in thread
From: Joseph Myers @ 2015-01-27  0:08 UTC (permalink / raw)
  To: Martin Uecker
  Cc: gcc Mailing List, Jeff Law, Jakub Jelinek, Marek Polacek,
	Florian Weimer, Balaji V. Iyer

On Mon, 26 Jan 2015, Martin Uecker wrote:

> extern void bar2(int (*x)[5]);

> 	int c = 4;
> 	int y[c];

> 	bar2(&y);	// not diagnosed (found by asan)

This is the undefined behavior "If the two array types are used in a 
context which requires them to be compatible, it is undefined behavior if 
the two size specifiers evaluate to unequal values." (C11 6.7.6.2#6).  
Yes, it would make sense for ubsan to detect this.  Generally, most forms 
of runtime undefined behavior listed in J.2 should have ubsan detection 
unless hard to detect / detected by some other sanitizer such as asan.

Does adding new forms of sanitization require upstream libsanitizer 
changes as well or can arbitrary ubsan checks be added without needing 
libsanitizer changes?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: array bounds, sanitizer, safe programming, and cilk array notation
  2015-01-26 19:54 array bounds, sanitizer, safe programming, and cilk array notation Martin Uecker
  2015-01-27  0:08 ` Joseph Myers
@ 2015-02-21 15:21 ` Marek Polacek
  2015-02-22  0:41   ` Martin Uecker
  1 sibling, 1 reply; 7+ messages in thread
From: Marek Polacek @ 2015-02-21 15:21 UTC (permalink / raw)
  To: Martin Uecker
  Cc: gcc Mailing List, Jeff Law, Joseph Myers, Jakub Jelinek,
	Florian Weimer, Balaji V. Iyer

Sorry for late reply - I've found this in my inbox only today.

On Mon, Jan 26, 2015 at 11:53:59AM -0800, Martin Uecker wrote:
> 
> Hi all,
> 
> I am writing numerical code, so I am trying to make the use 
> of arrays in C (with gcc) suck a bit less. In general, the long term
> goal would be to have either a compile-time warning or the possibility
> to get a run-time error if one writes beyond the end of an array as 
> specified by its type.
> 
> So one example (see below) I looked at is where I pass an array of
> too small size to a function, to see how see can be diagnosed. In some
> cases, we can get a runtime error with the address sanitizer, but this
> is fairly limited (e.g. it does not work when the array is embedded
> in a struct) and I also got mixed results when the function
> is inlined.
> 
> For pointers to arrays with static size one can get an "incompatible
> pointer" warning - which is nice. With clang, I also get warning for 
> pointers which are declared as array parameters and use the 'static' 
> keyword to specify a minimum size. This a diagnostic we are currently
> missing.
> 
> The next step would be to have diagnostics also for the VLA
> case if the size is known at compilation time (as in the
> example below) and a run-time error when it is not (maybe 
> with the undefined behaviour sanitizer?).
> 
> If we have the later, I think this might also help with safer 
> programming in C, because one would get either a compile time or 
> runtime error when I passing a buffer which is too small to
> a function. For example, snprintf could have a prototype like
> this:
> 
> int snprintf(size_t size; char str[static size], size_t size, 
>   const char *format, ...);
> 
> That VLAs essentially provide the bounded pointer type C is
> missing has been pointed out before, e.g. there was a proposal
> by John Nagle, although he proposed rather intrusive language
> changes (e.g. adding references to C) which are not necessary
> in my opinion:
> 
> https://gcc.gnu.org/ml/gcc/2012-08/msg00360.html
> 
> 
> Finally, what is missing is a way to diagnose problems inside
> the called functions. -Warray-bounds=2 (with my recently
> accepted patch) helps with this, but - so far - only for static 
> arrays:
> 
> void foo(int (*x)[4])
> {
> 	(*x)[4] = 5;	// warning
> }
 
This is detected by -fsanitize=object-size, turned on by default in
-fsanitize=undefined.  Since it makes use of __builtin_object_size,
it is necessary to compile with optimizations turned on.

> It would be nice to also have these warning and runtime errors
> (with the undefined behaviour sanitizer) for VLAs. 
> 
> Finally, I think we should have corresponding warning also
> for pointers which are declared as array parameters:
> 
> void foo2(int x[4])
> {
> 	x[4] = 5;
> }
 
Ditto.

> The later does not currently produce a warning, because x
> is converted to a pointer and the length is ignored. 
> 
> If it is not possible to have warning here for compatibility
> reasons, one possibility is to have an extension similar to 
> 'static' which makes 'x' a proper array in the callee, e.g. 

I think even the 'static in parameter array declarators' (ugly) C99 extension
isn't properly implemented yet (I don't think that the compiler makes
any optimization based on it).

So - if I understood this correctly - I think it's better to enhance
ubsan than to add any kind of language extensions.

	Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: array bounds, sanitizer, safe programming, and cilk array notation
  2015-01-27  0:08 ` Joseph Myers
@ 2015-02-21 15:41   ` Marek Polacek
  2015-02-23 17:53     ` Joseph Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Polacek @ 2015-02-21 15:41 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Martin Uecker, gcc Mailing List, Jeff Law, Jakub Jelinek,
	Florian Weimer, Balaji V. Iyer

Sorry for late reply.

On Tue, Jan 27, 2015 at 12:07:58AM +0000, Joseph Myers wrote:
> On Mon, 26 Jan 2015, Martin Uecker wrote:
> 
> > extern void bar2(int (*x)[5]);
> 
> > 	int c = 4;
> > 	int y[c];
> 
> > 	bar2(&y);	// not diagnosed (found by asan)
> 
> This is the undefined behavior "If the two array types are used in a 
> context which requires them to be compatible, it is undefined behavior if 
> the two size specifiers evaluate to unequal values." (C11 6.7.6.2#6).  
> Yes, it would make sense for ubsan to detect this.  Generally, most forms 
> of runtime undefined behavior listed in J.2 should have ubsan detection 
> unless hard to detect / detected by some other sanitizer such as asan.
 
I have created a table to that effect some time ago:
http://people.redhat.com/mpolacek/www/analyzability.html
Obviously the question marks should be replaced by a -fsanitize=
option that detects a particular UB.  Or say that a particular UB is a
compile-time error (e.g. "declaring a function at block scope with an explicit
storage-class specifier other than extern").

I don't know what to do with the UBs on the library side - those 7.* ones.

> Does adding new forms of sanitization require upstream libsanitizer 
> changes as well or can arbitrary ubsan checks be added without needing 
> libsanitizer changes?

I think we also need libubsan changes.  But it is usually just about
printing an error message along with some values - nothing terribly
complex.

	Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: array bounds, sanitizer, safe programming, and cilk array notation
  2015-02-21 15:21 ` Marek Polacek
@ 2015-02-22  0:41   ` Martin Uecker
  2015-02-24  1:46     ` questionable checks for flexible array members in c-ubsan.c and tree-vrp.c (was: Re: array bounds, sanitizer, safe programming, and cilk array notation) Martin Uecker
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Uecker @ 2015-02-22  0:41 UTC (permalink / raw)
  To: Marek Polacek
  Cc: gcc Mailing List, Jeff Law, Joseph Myers, Jakub Jelinek,
	Florian Weimer, Balaji V. Iyer



Marek Polacek <polacek@redhat.com>:

> Sorry for late reply - I've found this in my inbox only today.
> 
> On Mon, Jan 26, 2015 at 11:53:59AM -0800, Martin Uecker wrote:
>
> > Finally, what is missing is a way to diagnose problems inside
> > the called functions. -Warray-bounds=2 (with my recently
> > accepted patch) helps with this, but - so far - only for static 
> > arrays:
> > 
> > void foo(int (*x)[4])
> > {
> > 	(*x)[4] = 5;	// warning
> > }
>  
> This is detected by -fsanitize=object-size, turned on by default in
> -fsanitize=undefined.  Since it makes use of __builtin_object_size,
> it is necessary to compile with optimizations turned on.

Not always.  -fsanitize=object-size  does not seem to detect the
illegal out-of-bound access by itself, but instead uses some very 
coarse notion of object size. For example, the following was not
detected in my tests:


void foo(int (*f)[4])
{
        (*f)[4] = 1;	// warning
}

void bar(int n, int (*f)[n])
{
        (*f)[n] = 1;	// no warning
}

...
struct { int buf[4]; int x; } s;
foo(&s.buf);
bar(4, &s.buf);



Also in cases like this a compile-time warning/error is possible.
A compile-time warning is more useful, and I think -fsanitize should
not be a replacement for proper compile time warnings/errors.

Instead, I think it should catch exactly those cases, which cannot
be detected a compile time (i.e. VLA with size not known at compile
time) - so that the combination of -fsanitize with compile time errors
gives strict checking without unnecessary runtime checks.

... 

> > The later does not currently produce a warning, because x
> > is converted to a pointer and the length is ignored. 
> > 
> > If it is not possible to have warning here for compatibility
> > reasons, one possibility is to have an extension similar to 
> > 'static' which makes 'x' a proper array in the callee, e.g. 
> 
> I think even the 'static in parameter array declarators' (ugly) C99 extension
> isn't properly implemented yet (I don't think that the compiler makes
> any optimization based on it).

This is correct, it doesn't do anything now.

(I am still puzzled why the C committee did not just make the
parameter have a proper array type whenever a size is specified.
So would have been so much better and is what most programmers
would expect.)

> So - if I understood this correctly - I think it's better to enhance
> ubsan than to add any kind of language extensions.

ubsan should be enhanced to detect the cases which cannot
be detected at compile time. But it needs be able to detect the
out-of-bounds access directly, and not just access outside
some object which might be much larger when the array
is embedded in a larger object. 

The 'static' keyword only garantuees that the pointer points
to an array of a certain minimum size. So this can be used
to check that arrays of the required size are passed to
a function (and clang already emits a warning at a compile
time when it detects this). But 'static' does not mean
that accesses after the specified size are out-of-bounds
(at they would be for proper arrays). To achieve this
a language extension would be required. 

Martin




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: array bounds, sanitizer, safe programming, and cilk array notation
  2015-02-21 15:41   ` Marek Polacek
@ 2015-02-23 17:53     ` Joseph Myers
  0 siblings, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2015-02-23 17:53 UTC (permalink / raw)
  To: Marek Polacek
  Cc: Martin Uecker, gcc Mailing List, Jeff Law, Jakub Jelinek,
	Florian Weimer, Balaji V. Iyer

On Sat, 21 Feb 2015, Marek Polacek wrote:

> option that detects a particular UB.  Or say that a particular UB is a
> compile-time error (e.g. "declaring a function at block scope with an explicit
> storage-class specifier other than extern").

That one is already a hard error for cases such as static / register, a 
defined extension (nested functions) for auto.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* questionable checks for flexible array members in c-ubsan.c and tree-vrp.c (was:  Re: array bounds, sanitizer, safe programming, and cilk array notation)
  2015-02-22  0:41   ` Martin Uecker
@ 2015-02-24  1:46     ` Martin Uecker
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Uecker @ 2015-02-24  1:46 UTC (permalink / raw)
  To: Martin Uecker
  Cc: Marek Polacek, gcc Mailing List, Jeff Law, Joseph Myers,
	Jakub Jelinek, Florian Weimer, Richard Biener


Martin Uecker <uecker@eecs.berkeley.edu>:
> Marek Polacek <polacek@redhat.com>:
> 
> > > void foo(int (*x)[4])
> > > {
> > > 	(*x)[4] = 5;	// warning
> > > }
> >  
> > This is detected by -fsanitize=object-size, turned on by default in
> > -fsanitize=undefined.  Since it makes use of __builtin_object_size,
> > it is necessary to compile with optimizations turned on.
> 
> Not always.  -fsanitize=object-size  does not seem to detect the
> illegal out-of-bound access by itself, but instead uses some very 
> coarse notion of object size. For example, the following was not
> detected in my tests:

So I found some time to look at this. It seems this should be
detected by -fsanitize=bounds. But it isn't. The problem is in 
'gcc/c-family/c-ubsan.c' in 'ubsan_instrument_bounds'. The
code starting with

 /* Detect flexible array members and suchlike.  */
  tree base = get_base_address (array);
  if (base && (TREE_CODE (base) == INDIRECT_REF
               || TREE_CODE (base) == MEM_REF))
    {

also somehow prevents accesses via pointers to arrays
to be instrumented. This is bad because it eliminates
a lot of useful checks.

The code looks very similar to the code in 'tree-vrp.c' 
where similar checks prevented corresponding compile-time 
warnings. There I recently added a condition 
(warn_array_bounds < 2) to get the warnings atleast
for -Warray-bounds=2.

But looking at both places some more, the code simply seems wrong 
to me. There should be no need at all in the backend to 
somehow detect the case of flexible array members. Not having 
the frontend set an expression for the bound should be enough.

So I removed both code paths completely, and - so far - couldn't
trigger false positives for Warray-bounds or -fsanitize-bounds.
I attached a file with some tests. lines marked with 'ubsan'
are currently detected while lines with 'ubsan!' are only detected
after removing the test for flexible array members.

Am I missing something?

Martin





/* { dg-do compile } */
/* { dg-options "-O3 -Warray-bounds=2" } */

extern void* malloc(unsigned long x);

int e[3];

struct f { int f[3]; };

extern void bar(int v[]);
extern void barn(int n, int v[]);

struct h {

	int i;
	int j[];
};

struct h0 {

	int i;
	int j[0];
};

struct h0b {

	int i;
	int j[0];
	int k;
};

struct h1 {

	int i;
	int j[1];
};

struct h1b {

	int i;
	int j[1];
	int k;
};

struct h3 {

	int i;
	int j[3];
};

struct h3b {

	int i;
	int j[3];
	int k;
};

void foo(int (*a)[3], int n, int (*b)[n])
{
	(*a)[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan! */
	a[0][0] = 1;	// ok							
	a[1][0] = 1;	// ok
	a[1][3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan! */

	(*b)[n] = 1; // !							/* ubsan! */

	int c[3] = { 0 };
	int d[n];

	c[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan */
	d[n] = 1;	// !							/* ubsan */

	e[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan */

	struct f f;
	f.f[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan */

	int m = 3;
	int g[m];
	g[3] = 1;	// !							/* ubsan */

	struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
	struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
	struct h1* h1 = malloc(sizeof(struct h1) + 3 * sizeof(int));
	struct h3* h3 = malloc(sizeof(struct h3));

	h->j[3] = 1;	// flexible array member
	h0->j[3] = 1;	// zero-sized array extension
	h1->j[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan! */
	h3->j[3] = 1;	/* { dg-warning "subscript is above array bound" } */	/* ubsan! */


	struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
	struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
	struct h3b* h3b = malloc(sizeof(struct h3b));
	h0b->j[3] = 1;	// warning ?
	h1b->j[3] = 1;;	/* { dg-warning "subscript is above array bound" } */	/* ubsan */
	h3b->j[3] = 1;;	/* { dg-warning "subscript is above array bound" } */	/* ubsan */

	struct hn {

		int i;
		int j[n];
	} hn;

	hn.j[3] = 1;	// !			/* ubsan */

	struct hnb {

		int i;
		int j[n];
		int k;
	} hnb;

	hnb.j[3] = 1;	// !			/* ubsan */
}


int main()
{
	int a[3];
	foo(&a, 3, &a);
}





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-24  1:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 19:54 array bounds, sanitizer, safe programming, and cilk array notation Martin Uecker
2015-01-27  0:08 ` Joseph Myers
2015-02-21 15:41   ` Marek Polacek
2015-02-23 17:53     ` Joseph Myers
2015-02-21 15:21 ` Marek Polacek
2015-02-22  0:41   ` Martin Uecker
2015-02-24  1:46     ` questionable checks for flexible array members in c-ubsan.c and tree-vrp.c (was: Re: array bounds, sanitizer, safe programming, and cilk array notation) Martin Uecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).