The utility of standard's semantics for overflow

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* The utility of standard's semantics for overflow
@ 2005-06-29  8:33 Michael Veksler
  2005-06-29  8:42 ` Robert Dewar
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael Veksler @ 2005-06-29  8:33 UTC (permalink / raw)
  To: gcc

After reading many messages on the subject of overflow
of signed integral values, I am inclined to think that the
standard was wrong to say that int overflow is undefined.

Of course this definition improves performance, but at
what cost? At the cost of program stability?

If the programmer wants a robust application,
then casting to unsigned must be present for almost any
usage of int. In the end, nobody will use int because it
is very difficult to prove that it does not wrap for all
possible input. And since the compiler will draw more
and more conclusions from "assume overflow never happens",
the effects will become more and more destructive.

Checking for int overflow after the fact is much simpler
than playing tricks with casting - before the fact. So
people will simply move to unsigned and implement
2's complement themselves.

To let the compilers gain the performance the standard
intended, it had to introduce a new type or a modifier.

Only
   for (nooverflow int i=0; i <= x ; ++i)
       ++count;
would be transformed to
  count+=x+1

Normal 'int' will then have an implementation defined
behavior of overflow. Unfortunately, the standard attempts
at improving legacy code, and as a result breaks legacy
code.

This is unlike aliasing, when most lines of code out there,
did not break aliasing rules (even before they were
introduced). Int overflow is violated by most lines of
code I have seen (it is very uncommon to find code that
asserts no overflow before a+b).

Also, the compiler itself may introduce new cases of
overflow (e.g. after transforming a*b+a*c  to a*(b+c),
when run with a==0, and b,c == MAX_INT).
I am not sure if this may create invalid assumptions
in later compiler passes (today's gcc or later). I did not
try to formally prove it either way. (I tend to think that
invalid assumptions will be introduced by, e.g.,  a
later VRP pass).

I don't know what gcc can do to improve the situation,
the standard is a thing gcc has to live with. Maybe
start by trying to affect c++0x ?

   Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  8:33 The utility of standard's semantics for overflow Michael Veksler
@ 2005-06-29  8:42 ` Robert Dewar
  2005-06-29  9:04   ` Michael Veksler
  2005-06-29  8:48 ` Eric Botcazou
  2005-06-29  9:47 ` Theodore Papadopoulo
  2 siblings, 1 reply; 14+ messages in thread
From: Robert Dewar @ 2005-06-29  8:42 UTC (permalink / raw)
  To: Michael Veksler; +Cc: gcc

Michael Veksler wrote:

> If the programmer wants a robust application,
> then casting to unsigned must be present for almost any
> usage of int.

If you have a variable in your program that is signed but must
always be in the range of int, then int is the appropriate
representation. If the pressure in a tank must be in the
range -2**31 .. +2**31-1, it does not make the application
more robust to ensure that if the pressure goes "over" the max, it
suddenly turns negative. That's likely to be just as
disastrous as any other behavior in this serious error
situation.

In practice of course, the pressure in this example is
likely to be in a much more constrained range, and part
of making a "robust application" will be to ensure that
the value always remains in the required range. In a more
expressive language like Ada, the corresponding type would
be declared with the appropriate range, and an exception
would be raised if the value is outside this range.

In practice in a critical application, you are likely to
not want any exceptions, so proving such a program exception
free is one of the tasks that faces the applications programmer
in writing a reliable program. See for example:

http://www.praxis-cs.com/pdfs/Industrial_strength.pdf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  8:33 The utility of standard's semantics for overflow Michael Veksler
  2005-06-29  8:42 ` Robert Dewar
@ 2005-06-29  8:48 ` Eric Botcazou
  2005-06-29 11:16   ` Michael Veksler
  2005-06-29  9:47 ` Theodore Papadopoulo
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Botcazou @ 2005-06-29  8:48 UTC (permalink / raw)
  To: Michael Veksler; +Cc: gcc

> This is unlike aliasing, when most lines of code out there,
> did not break aliasing rules (even before they were
> introduced).

Are you sure?  IIRC -fstrict-aliasing was once enabled at -O2 and then 
disabled to give people more time to fix their code.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  8:42 ` Robert Dewar
@ 2005-06-29  9:04   ` Michael Veksler
  2005-06-29  9:18     ` Robert Dewar
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Veksler @ 2005-06-29  9:04 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc

Robert Dewar <dewar@adacore.com> wrote on 29/06/2005 11:42:07:

> Michael Veksler wrote:
>
> > If the programmer wants a robust application,
> > then casting to unsigned must be present for almost any
> > usage of int.
>
> If you have a variable in your program that is signed but must
> always be in the range of int, then int is the appropriate
> representation. If the pressure in a tank must be in the
> range -2**31 .. +2**31-1, it does not make the application
> more robust to ensure that if the pressure goes "over" the max, it
> suddenly turns negative. That's likely to be just as
> disastrous as any other behavior in this serious error
> situation.

This is right to some extent, and I referred to it in my original
mail. I claim that it is easier to write a code that checks these cases
after the overflow, rather than before. I also claim that checking
overflows (as implied by the standard) results in almost pure
unsigned arithmetic, so why do we have signed arithmetic
to begin with?

>
> In practice of course, the pressure in this example is
> likely to be in a much more constrained range, and part
> of making a "robust application" will be to ensure that
> the value always remains in the required range. In a more
> expressive language like Ada, the corresponding type would
> be declared with the appropriate range, and an exception
> would be raised if the value is outside this range.

But this is not what C does. In C there is an assumption
"bugs like int overflow may be transformed into any other
possible bug". No exception need be raised.

> In practice in a critical application, you are likely to
> not want any exceptions, so proving such a program exception
> free is one of the tasks that faces the applications programmer
> in writing a reliable program. See for example:
>
> http://www.praxis-cs.com/pdfs/Industrial_strength.pdf
>

And as it is written in section 3
"... gives compiler writers substantial freedom to re-order expressions
..."
and then
"A more sound approach is to design a language so that these
ordering effects cannot occur".
This last quote can be implemented only by moving to modulo semantics.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  9:04   ` Michael Veksler
@ 2005-06-29  9:18     ` Robert Dewar
  0 siblings, 0 replies; 14+ messages in thread
From: Robert Dewar @ 2005-06-29  9:18 UTC (permalink / raw)
  To: Michael Veksler; +Cc: gcc

Michael Veksler wrote:

> This is right to some extent, and I referred to it in my original
> mail. I claim that it is easier to write a code that checks these cases
> after the overflow, rather than before. I also claim that checking
> overflows (as implied by the standard) results in almost pure
> unsigned arithmetic, so why do we have signed arithmetic
> to begin with?

er .. because we want 1 / (-1) to be -1 instead of 0?
       because we want -1 to be less than 1
       etc.

signed arithmetic is a bit different from unsigned arithmetic :-)

> But this is not what C does. In C there is an assumption
> "bugs like int overflow may be transformed into any other
> possible bug". No exception need be raised.

Right, which is like Ada with checks suppressed, but in general
in critical applications exceptions are turned off anyway, which
leaves Ada in EXACTLY the same situation as C (overflow erroneous)

> And as it is written in section 3
> "... gives compiler writers substantial freedom to re-order expressions
> ..."
> and then
> "A more sound approach is to design a language so that these
> ordering effects cannot occur".

> This last quote can be implemented only by moving to modulo semantics.

Overflow and ordering effects are quite different. And if you want to
avoid undefined for overflow, modulo semantics is only one of many ways
of doing this (none of which has been suggested or adopted by the C
standards committee -- when you find yourself disagreeing with this
committee in this way, you need to really do some studying to find out
why before deciding they were wrong).


> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  8:33 The utility of standard's semantics for overflow Michael Veksler
  2005-06-29  8:42 ` Robert Dewar
  2005-06-29  8:48 ` Eric Botcazou
@ 2005-06-29  9:47 ` Theodore Papadopoulo
  2005-06-29 12:14   ` Robert Dewar
  2 siblings, 1 reply; 14+ messages in thread
From: Theodore Papadopoulo @ 2005-06-29  9:47 UTC (permalink / raw)
  To: Michael Veksler; +Cc: gcc

On Wed, 2005-06-29 at 11:32 +0300, Michael Veksler wrote:

> This is unlike aliasing, when most lines of code out there,
> did not break aliasing rules (even before they were
> introduced). Int overflow is violated by most lines of
> code I have seen (it is very uncommon to find code that
> asserts no overflow before a+b).

Believe it or not most uses of integral values are made in such a way
that overflow is the exception rather than the rule (at least on general
computers where the int arithmetic and the memory is cheap, in embeded
system the situtation might differ somewhat even thought I have doubts
if the embedded processors are of 32b class, for 8/16b processor the
story is of course different). In most cases, the programmers choose the
type to allow for all the standard cases and do not look at the
possibility of overflow. How many loops are written using ints or
unsigned for very small integers where even a short might be
sufficient....

Untill now, there is a widespread assumption that 2^32 or 2^31 is
equivalent to infinity for most purposes, because those numbers will
never be reached (remember the unix clock ticks within a 32 bit
unsigned, which still has a few (counted) years to go) in any practical
situation (of course if a user wants to break the code and has switches
to provide initial values.

So unless you do arithmetics or combinatorics, most of the uses of
"wide" (ie > 32b) integral types semantically (ie in the programmer's
mind) assume that overflow does not happen in practise in the program.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  8:48 ` Eric Botcazou
@ 2005-06-29 11:16   ` Michael Veksler
  2005-06-29 16:19     ` Eric Botcazou
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Veksler @ 2005-06-29 11:16 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

Eric Botcazou <ebotcazou@libertysurf.fr> wrote on 29/06/2005 11:49:24:

> > This is unlike aliasing, when most lines of code out there,
> > did not break aliasing rules (even before they were
> > introduced).
>
> Are you sure?  IIRC -fstrict-aliasing was once enabled at -O2 and then
> disabled to give people more time to fix their code.
>

Yes, I am pretty sure. I said "most lines of code", not "most
applications",
to indicate the density difference. If each line of code has, e.g., 1%
chance
to violate overflow rules, and 0.01% chance to violate aliasing rules,
then for 10KLOC, you have:
 - probability of 63%   to violate aliasing rules
 - and 100% (99.99....9  with 43 nines) to violate overflow rules.

So most chances an a small application you will have at least
one violation for each rule. However, the number of violation will
be substantially different between the two:

The mean for number of violations for 10KLOC is:
- aliasing: 1
- overflow: 100

The numbers on probabilities are concocted, but they give the
general feeling of what I meant.
Aliasing problems may happen only when either:
1. Using type casts of pointers.
2. In C: implicit void* conversions.
3. Implicit pointer conversions through union.

These are less common (LOC percentage) in most code than
  i++, or a+b
Algorithm inputs are rarely validated for potential future overflow,
but outputs are normally validated for sane results. If you are
not supposed to produce a negative result, you are going to
catch it no later than at the beginning of your next stage
(assuming you have at least some sanity checks).

One more note: Programming languages should not only be sound,
they should fulfill a wide range of needs of the development cycle.
One of the needs is the ease of writing correct code and verifying it.
By defining a language rule that makes C/C++ very fast at the cost
of x2 more bugs is an unacceptable design decision. Most of these
new bugs will be latent bugs, and users will normally not encounter
them, but they will be there waiting for new bad inputs and an
improved compiler inteligence.

   Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29  9:47 ` Theodore Papadopoulo
@ 2005-06-29 12:14   ` Robert Dewar
  2005-06-29 13:12     ` Dave Korn
  0 siblings, 1 reply; 14+ messages in thread
From: Robert Dewar @ 2005-06-29 12:14 UTC (permalink / raw)
  To: Theodore.Papadopoulo; +Cc: Michael Veksler, gcc

Theodore Papadopoulo wrote:

> So unless you do arithmetics or combinatorics, most of the uses of
> "wide" (ie > 32b) integral types semantically (ie in the programmer's
> mind) assume that overflow does not happen in practise in the program.

I think that's probably right. And in the context of this discussion,
what does happen to most programs if an int used with this assumption
overflows? Answer, it's probably a bug, and it is unlikely that
silent wrapping will correspond to correct behavior.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: The utility of standard's semantics for overflow
  2005-06-29 12:14   ` Robert Dewar
@ 2005-06-29 13:12     ` Dave Korn
  2005-06-29 13:35       ` Robert Dewar
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dave Korn @ 2005-06-29 13:12 UTC (permalink / raw)
  To: 'Robert Dewar', Theodore.Papadopoulo
  Cc: 'Michael Veksler', gcc

----Original Message----
>From: Robert Dewar
>Sent: 29 June 2005 13:14

> Theodore Papadopoulo wrote:
> 
>> So unless you do arithmetics or combinatorics, most of the uses of
>> "wide" (ie > 32b) integral types semantically (ie in the programmer's
>> mind) assume that overflow does not happen in practise in the program.
> 
> I think that's probably right. And in the context of this discussion,
> what does happen to most programs if an int used with this assumption
> overflows? Answer, it's probably a bug, and it is unlikely that
> silent wrapping will correspond to correct behavior.

  In fact, doesn't this suggest that in _most_ circumstances, *saturation*
would be the best behaviour?

  And of course, since it's undefined, a compiler could entirely
legitimately use saturating instructions (on a platform that supports them)
for signed int arithmetic.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29 13:12     ` Dave Korn
@ 2005-06-29 13:35       ` Robert Dewar
  2005-06-29 18:37       ` Olivier Galibert
  2005-07-01 18:30       ` Alexandre Oliva
  2 siblings, 0 replies; 14+ messages in thread
From: Robert Dewar @ 2005-06-29 13:35 UTC (permalink / raw)
  To: Dave Korn; +Cc: Theodore.Papadopoulo, 'Michael Veksler', gcc

Dave Korn wrote:

>   In fact, doesn't this suggest that in _most_ circumstances, *saturation*
> would be the best behaviour?

Actually I think a handlable exception, a la Ada, is the best solution.
Whether saturation is appropriate is really problem dependent. If you
are counting the number of primes less than bla, then saturating is not
likely to be helpful. If you are computing some real quantity with
minimal precision, saturation might be helpful
> 
>   And of course, since it's undefined, a compiler could entirely
> legitimately use saturating instructions (on a platform that supports them)
> for signed int arithmetic.

yes, of course, this would indeed be legitimate, and could be an option.
Almost anything is reasonable as an option. I think it would be quite
reasonable to have options for trapping (-ftrapv working nicely :-) and
for wrap around. But I would not normally make either of these the default
in C.
> 
>     cheers,
>       DaveK


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29 11:16   ` Michael Veksler
@ 2005-06-29 16:19     ` Eric Botcazou
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Botcazou @ 2005-06-29 16:19 UTC (permalink / raw)
  To: Michael Veksler; +Cc: gcc

> Yes, I am pretty sure. I said "most lines of code", not "most
> applications",
> to indicate the density difference. If each line of code has, e.g., 1%
> chance
> to violate overflow rules, and 0.01% chance to violate aliasing rules,
> then for 10KLOC, you have:
>  - probability of 63%   to violate aliasing rules
>  - and 100% (99.99....9  with 43 nines) to violate overflow rules.

Then there are different "most"s because, if each line of code has 1%
chance to violate overflow rules, "most" of them don't for reasonable 
definitions of "most".

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29 13:12     ` Dave Korn
  2005-06-29 13:35       ` Robert Dewar
@ 2005-06-29 18:37       ` Olivier Galibert
  2005-07-01 18:22         ` Alexandre Oliva
  2005-07-01 18:30       ` Alexandre Oliva
  2 siblings, 1 reply; 14+ messages in thread
From: Olivier Galibert @ 2005-06-29 18:37 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Robert Dewar',
	Theodore.Papadopoulo, 'Michael Veksler',
	gcc

On Wed, Jun 29, 2005 at 02:12:40PM +0100, Dave Korn wrote:
>   In fact, doesn't this suggest that in _most_ circumstances, *saturation*
> would be the best behaviour?

No, you'd be killing most emulators and a lot of virtual machine
implementations.

  char x = (char)((unsigned char)y + (unsigned char)z)

is too ugly to live.

  OG.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29 18:37       ` Olivier Galibert
@ 2005-07-01 18:22         ` Alexandre Oliva
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Oliva @ 2005-07-01 18:22 UTC (permalink / raw)
  To: Olivier Galibert
  Cc: Dave Korn, 'Robert Dewar',
	Theodore.Papadopoulo, 'Michael Veksler',
	gcc

On Jun 29, 2005, Olivier Galibert <galibert@pobox.com> wrote:

>   char x = (char)((unsigned char)y + (unsigned char)z)

> is too ugly to live.

Yeah, indeed, one shouldn't assume char is signed :-) :-)

(strictly speaking, you don't, but I thought this would be funny so I
went ahead and posted it)

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: The utility of standard's semantics for overflow
  2005-06-29 13:12     ` Dave Korn
  2005-06-29 13:35       ` Robert Dewar
  2005-06-29 18:37       ` Olivier Galibert
@ 2005-07-01 18:30       ` Alexandre Oliva
  2 siblings, 0 replies; 14+ messages in thread
From: Alexandre Oliva @ 2005-07-01 18:30 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Robert Dewar',
	Theodore.Papadopoulo, 'Michael Veksler',
	gcc

On Jun 29, 2005, "Dave Korn" <dave.korn@artimi.com> wrote:

>   In fact, doesn't this suggest that in _most_ circumstances, *saturation*
> would be the best behaviour?

If that's for user-introduced overflows, it could be useful in some
cases, but I wouldn't go as far as `most'.

For compiler-introduced overflows (e.g., re-association), wrap
semantics enable some optimizations that you just couldn't perform
otherwise.  Consider the already-mentioned example:

  (a + b) + c

if you re-group this as:

  a + (b + c)

and `b + c' overflows, and then the addition with a underflows, that's
not a problem, since modulo semantics will give you the correct
result.

Turning:

  a * b + a * c

into

  a * (b + c)

might be a problem should b + c overflow, but if you use modulo
semantics, you'll always get the correct result for cases in which
none of the original operations overflowed.

What you must be careful to avoid, however, is deriving further
assumptions from `(b + c) does not overflow'.  Since it's a
compiler-introduced operation in both cases, that's only valid if you
can assume modulo semantics, assuming it doesn't overflow will lead
you to wrong conclusions.

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-07-01 18:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-29  8:33 The utility of standard's semantics for overflow Michael Veksler
2005-06-29  8:42 ` Robert Dewar
2005-06-29  9:04   ` Michael Veksler
2005-06-29  9:18     ` Robert Dewar
2005-06-29  8:48 ` Eric Botcazou
2005-06-29 11:16   ` Michael Veksler
2005-06-29 16:19     ` Eric Botcazou
2005-06-29  9:47 ` Theodore Papadopoulo
2005-06-29 12:14   ` Robert Dewar
2005-06-29 13:12     ` Dave Korn
2005-06-29 13:35       ` Robert Dewar
2005-06-29 18:37       ` Olivier Galibert
2005-07-01 18:22         ` Alexandre Oliva
2005-07-01 18:30       ` Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).