public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-03 17:07 Richard Kenner
  2003-12-03 19:48 ` Felix Lee
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Kenner @ 2003-12-03 17:07 UTC (permalink / raw)
  To: felix.1; +Cc: gcc

    (checking whether combinations of -f switches produce correct
    code is a separate issue, 

No, that's the issue being discussed and what was meant by it being a
QA problem.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 17:07 Should -fcross-jumping be part of -O1? Richard Kenner
@ 2003-12-03 19:48 ` Felix Lee
  0 siblings, 0 replies; 66+ messages in thread
From: Felix Lee @ 2003-12-03 19:48 UTC (permalink / raw)
  To: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner):
>     (checking whether combinations of -f switches produce correct
>     code is a separate issue, 
> No, that's the issue being discussed and what was meant by it being a
> QA problem.

no, that should be done no matter what -O options there are.
more -O options create more issues that have to be checked, and
making sure that 5 -O options work sensibly is a lot simpler than
making sure 4**5 -O options work sensibly.

and anyone who wants to play with all the -f flags is probably
going to be savvy enough to handle the problems they cause.  the
-O flags should be simple and reliable for general users.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-05 11:46 Gareth McCaughan
  0 siblings, 0 replies; 66+ messages in thread
From: Gareth McCaughan @ 2003-12-05 11:46 UTC (permalink / raw)
  To: gcc

Robert Dewar wrote:

> Yes, the purists are upset, they would like all x86 chips to vaporize.
> I think that unlikely to happen :-)

Don't be too sure. Have you seen how hot modern Intel processors
get? :-)

-- 
g


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04 13:45                   ` Gabriel Dos Reis
  2003-12-04 13:51                     ` Scott Robert Ladd
@ 2003-12-05  1:04                     ` Robert Dewar
  1 sibling, 0 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-05  1:04 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: gcc

Gabriel Dos Reis wrote:

> Please, who mentions Fortran compared to language X?

The discussion is pretty much language independent, I merely used 
Fortran here as an example (not inappropriate if you are talking about 
numerical code). We are talking about gcc here, and certainly our
discussion of optimization in gcc applies to g77 as well as the
other languages supported by gcc.

> Sure it is.  If the result falls into the predicted interval, and
> therefore the expected decision is made based on that then then the
> computation is correct.

Predicted by what? Notice also that you are shifting your focus from
reproducibility to predictability (as I pointed out these are different,
and as you know the thread was originally about the effect of 
optimization on reproducibility).

> That is an artefact of your rheterics. "correct" does have a useful
> content.  It is no 

Well the above point is a bit unclear, but my point is that correct 
means different things to different people, so it is fundamentally a 
confusing term. I still don't know precisely how you define it.

> There is a whole branch of mathematical approach to numerics based on
> interval aritmetic upon which on can base sofwtare construction with
> predictable results -- as far as the compiler implementer does not go
> his way of playing language lawyering non-sense.

I am of course perfectly familiar with the use of interval arithmetic,
but you will have to be much clearer on why you think this is relevant
to the discussion. There is some class of actions that you declare to
be nonsense, but you are far from clear in defining what this set of
actions might be. It would be helpful if you could be precise instead
of simply throwing terms like nonsense and correct around.

> In such context, algorithms are most of the time based on combination
> of numeric and symbolic computation and it is far more important to
> get a (numeric) computation correct than to get it fast.

Again, I don't know what you mean by correct. To me correctness can only
be established by reference to a language definition that defines what 
translation is correct. We can't define correct as "what gaby has in 
mind even if he does not define it precisely". Perhaps what would be
helpful is if, for any particular language as an example, exactly the
set of semantics that would define correct to you.

> "correct" is not a rhetorical device.  

Well what I mean by a rhetorical device is that basically so far you are
saying "we want correctness [who could object?] and you should know what
that means since I can't be bothered to define it precisely."

> I don't time left, and I'm not quite interested in distractive
> language lawyering game. But I feel it is really misleading to
> outright reject (correctness, predictability which together imply)
> reproducibility. 

Predictability and correctness are certainly perfectly reasonable goals,
but correctness must be with respect to a language definition. For 
example, the Ada definition has a very precise definition, but it is
carefully designed to be sufficiently non-deterministic to accomodate
all common machine arithemtics. Now in an IEEE environment, it is
perfectly reasonable to add a precise set of rules to map Ada operations
into well defined IEEE operations (not completely removing 
non-determinism, since as I am sure you know, the IEEE standard is not
100% deterministic, but it is close enough for most purposes). See
Sam Figueroa's thesis for a precise proposal for doing this in Ada 95.
Or see Apple's spec for Pascal.

Language lawyering is precisely about defining what correct means. If
you don't like a spec for a particular language, because it is 
incomplete in your view, you can't simply assume that everyone will
agree on the delicate task of precise mapping of language semantics
to IEEE operations, there are quite a few delicate issues to be
dealt with (e.g. choice of base types in Ada 95).

Once you stress correctness, you are in the language lawyering business
whether you like it or not.

> No, I don't think so.  "Correct" always implies expected results in a
> given context.  It may become bad only when/if one plays distractive
> lawyering, but then that is true for virtually anything.

That's right, and those expected results are expected according to some
well defined language. Once again, we need a definition, not just vague
ideas of what is or is not obviously correct!

Language lawyering, which you dismiss as destractive (destracting I 
assume), is precisely about determining whether results are correct
according to a precise definition.

You seem to insist on the idea that it is obvious what is correct or 
not, and you don't need a precise definition suitable for perusal by
language lawyers.

Well I think that means you don't understand all the issues. It is not 
at ALL obvious what the exact criteria for correctness should be if you
demand full IEEE predictability. For any given language, it is a 
non-trivial task to determine what the rules are.

For example, what exactly should the rules be for converion of double to 
string in C? Insisting on totally accurate conversion is a significant
burden, and requires the ability to output a very large number of
decimal digits. The IEEE standard has well defined precision bounds for
conversions of this type, but we know better algorithms now, so one is
tempted to be stricter than IEEE in this regard, and some would consider
that totally precise conversion is worth the cost. This is not a simple
discussion, and it is far from obvious what the choice should be.

Robert


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  1:27             ` Robert Dewar
  2003-12-04  1:49               ` Gabriel Dos Reis
@ 2003-12-04 21:03               ` Toon Moene
  1 sibling, 0 replies; 66+ messages in thread
From: Toon Moene @ 2003-12-04 21:03 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Gabriel Dos Reis, gcc

Robert Dewar wrote:

> We are not talking minor slow downs here, but slow downs by a 
> significant factor (which can easily be 2 or 3). Many users of fpt 
> cannot even consider this level of inefficiency.

We would consider it, and then choose the alternative that's faster.

One of the most interesting quotes I read from Kahan is that 
(paraphrased): Users of Cray hardware don't care - their software is 
robust even against the cavalier approach of Cray floating point arithmetic.

Of course, this is the wrong way around: Cray flourished with its 
five-ulp off division because the crowd that could effort a Cray had 
codes that didn't _need_ the foregone exactness.

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  1:27     ` Mike Stump
@ 2003-12-04 18:40       ` Joe Buck
  0 siblings, 0 replies; 66+ messages in thread
From: Joe Buck @ 2003-12-04 18:40 UTC (permalink / raw)
  To: Mike Stump; +Cc: Jan Hubicka, Scott Robert Ladd, gcc mailing list

On Wed, Dec 03, 2003 at 05:17:10PM -0800, Mike Stump wrote:
> As one of the people concerned with compilation speed, -O0 is fine.  We 
> want it to mean, go fast, easy to debug, no optimizations that slow 
> compile time, but can include optimizations that improve compile time 
> that are safe, well tested and robust. With -O2, we don't care about 
> compilation speed, well, we do, but not as much as we do about -O0.

I see no reason to demand that -O0 be completely stupid.  If we have

	var = EXPRESSION;

and EXPRESSION has ten arithmetic operations, any simplification that
assigns the correct value to var is just fine, because what matters is
that the state as seen by gdb before and after the statement is correct.

> So, unless someone wants -O0 to not include some optimizations, I think 
> status quo is fine.  From most of what I'ev seen to date on the list, 
> it would like having more optimizations (from a very limited set of all 
> optimizations) at -O0 would be fine.

Exactly.  Some have suggested having the default be an optimization level
that does a few optimizations but preserves debuggability, but keep an
-O0 that does none.  This is useless, as the -O0 is likely to be slower
(as a bunch of extra useless operations have to be converted into RTL
and then to assembler).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04 13:45                   ` Gabriel Dos Reis
@ 2003-12-04 13:51                     ` Scott Robert Ladd
  2003-12-05  1:04                     ` Robert Dewar
  1 sibling, 0 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-04 13:51 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: gcc

Gabriel Dos Reis wrote:
> Sure it is.  If the result falls into the predicted interval, and
> therefore the expected decision is made based on that then then the
> computation is correct.

All calculations involve a certain level of uncertainty, whether done on 
paper or in silicon. Alas, far too many programmers fail to understand 
basic mathematical principles.

Of course, we also live in a world where someone can enter a value in 
newtons into a program expecting metric units, thus dooming expensive 
spacecraft. In many ways, complexities with computer math arise more 
from human perceptions than from technical considerations.

> There is a whole branch of mathematical approach to numerics based on
> interval aritmetic upon which on can base sofwtare construction with
> predictable results -- as far as the compiler implementer does not go
> his way of playing language lawyering non-sense.  Have a look for
> example how interval airthmetic are used in computation geometry
> (http://www.cgal.org/ being the most popular), or in CAGD where it is
> quite important to get the correct topological structrure of shapes.
> In such context, algorithms are most of the time based on combination
> of numeric and symbolic computation and it is far more important to
> get a (numeric) computation correct than to get it fast.

Exactly. Interval arithmetic is an excellent tool. It's implemented by 
specialized tools like MatLab, Maple, and Mathematica, and through 
libraries for Lisp and C++. Here's a good collection of resources for 
those who aren't familiar with the topic:

     http://www.cs.utep.edu/interval-comp/main.html

Some effort has been made toward integrating interval mathematics into 
GCC, as per the following paper:

     http://home.ku.edu.tr/~ahakkas/publications/comp-supp.pdf

> I don't time left, and I'm not quite interested in distractive
> language lawyering game. But I feel it is really misleading to
> outright reject (correctness, predictability which together imply)
> reproducibility. 

I tend to agree; there exist trade-offs based on the limitations of 
hardware and software, but I find more errors in mathematics from people 
who do not understand basic concepts like rounding and significant 
digits. An absolutely correct result that takes too long to calculate 
(say, in a real-time system) is no better than an instant answer that is 
wrong (in any system).

Rather than seeking a generic answer, perhaps a better approach is for 
someone (me?) to document how GCC options affect accuracy and speed, to 
educate those who care abotu such matters? A practical guide might be 
useful, even though a number of academic papers exist on the subject.

Hmmm... I need to think about this some more.

..Scott

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  7:33                 ` Robert Dewar
@ 2003-12-04 13:45                   ` Gabriel Dos Reis
  2003-12-04 13:51                     ` Scott Robert Ladd
  2003-12-05  1:04                     ` Robert Dewar
  0 siblings, 2 replies; 66+ messages in thread
From: Gabriel Dos Reis @ 2003-12-04 13:45 UTC (permalink / raw)
  To: gcc

Robert Dewar <dewar@gnat.com> writes:

| > I know.  There are many and many. Many users of fpt also can't
| > understand why someone would value speed over correctness when
| > correctness is the purpose of the computation.
| 
| That's going too far. When you get different results in a Fortran
| compiler at -O0 and -O2, then there is no absolute judgment that one
| semantics is more correct than the other.

Please, who mentions Fortran compared to language X?

I don't think it is appropriate to add more to the confusion.

| Most likely on an x86, the
| optimized version will in fact give more accurate results. Of course
| predictability may be more important than accuracy, but it is not
| helpful to use the word correct in this context.

Sure it is.  If the result falls into the predicted interval, and
therefore the expected decision is made based on that then then the
computation is correct.

[...]

| So I think it is better to avoid using correct here, it is just a
| rhetorical device with no particular content.

That is an artefact of your rheterics. "correct" does have a useful
content.  It is no 

There is a whole branch of mathematical approach to numerics based on
interval aritmetic upon which on can base sofwtare construction with
predictable results -- as far as the compiler implementer does not go
his way of playing language lawyering non-sense.  Have a look for
example how interval airthmetic are used in computation geometry
(http://www.cgal.org/ being the most popular), or in CAGD where it is
quite important to get the correct topological structrure of shapes.
In such context, algorithms are most of the time based on combination
of numeric and symbolic computation and it is far more important to
get a (numeric) computation correct than to get it fast.

"correct" is not a rhetorical device.  

I don't time left, and I'm not quite interested in distractive
language lawyering game. But I feel it is really misleading to
outright reject (correctness, predictability which together imply)
reproducibility. 

[...]

| These four criteria are quite different. Note in particular, that
| you are identifying correctness with reproducibility, but I think that
| most people who would like to use the term would identify it with
| predictability, which is a different criterion (predictability
| implies reproducibility, but not vice versa).
|
| You surely in fact shift in your use of the term. At first in your
| message you are definitely talking about reproducibility (after all
| this whole thread was about effects of optimization). But at the end
| where you say:
| 
| "correctness is the purpose of the computation" I think you must be
| talking about predictability.
| 
| But of course there are others for whom floating-point arithmetic is
| simply an approximation of real arithmetic. With this view, the
| computation is never correct, but doing it with maximum precision is
| at least the most correct possible.
| 
| That's why correctness is a bad term.

No, I don't think so.  "Correct" always implies expected results in a
given context.  It may become bad only when/if one plays distractive
lawyering, but then that is true for virtually anything.

-- Gaby

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  1:49               ` Gabriel Dos Reis
@ 2003-12-04  7:33                 ` Robert Dewar
  2003-12-04 13:45                   ` Gabriel Dos Reis
  0 siblings, 1 reply; 66+ messages in thread
From: Robert Dewar @ 2003-12-04  7:33 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: gcc

> I know.  There are many and many. Many users of fpt also can't
> understand why someone would value speed over correctness when
> correctness is the purpose of the computation.

That's going too far. When you get different results in a Fortran 
compiler at -O0 and -O2, then there is no absolute judgment that one 
semantics is more correct than the other. Most likely on an x86, the
optimized version will in fact give more accurate results. Of course
predictability may be more important than accuracy, but it is not 
helpful to use the word correct in this context.

A properly analyzed Ada floating-point program for example is required 
to work correctly whether or not there is extra precision. If the result 
is wrong with extra precision, then the program is wrong in the sense 
that it is not analyzed with respect to the Ada floating-point model 
(which is quite precise).

In fact many floating-point programmers regard fpt as sort of
"incorrect" all the time, and would regard higher precision and
accuracy as always desirable.

So I think it is better to avoid using correct here, it is just a 
rhetorical device with no particular content.

The fact of the matter is that there are several inconsistent goals that 
you might want to satisfy in a floating-point algorithm:

- Reproducible results independent of optimization. Useful for
   debugging and for giving confidence that optimization is not
   messing up.

- Maximum accuracy for the calculations performed

- Maximum speed of calculation

- Predictability, you want to know EXACTLY what operations will be'
   performed and how the language semantics map onto the hardware, and
   by the way, know exactly what the hardware does in all cases (not
   true of all machines by any means).

These four criteria are quite different. Note in particular, that
you are identifying correctness with reproducibility, but I think that
most people who would like to use the term would identify it with
predictability, which is a different criterion (predictability
implies reproducibility, but not vice versa).

You surely in fact shift in your use of the term. At first in your 
message you are definitely talking about reproducibility (after all this 
whole thread was about effects of optimization). But at the end where 
you say:

"correctness is the purpose of the computation" I think you must be 
talking about predictability.

But of course there are others for whom floating-point arithmetic is 
simply an approximation of real arithmetic. With this view, the 
computation is never correct, but doing it with maximum precision is at 
least the most correct possible.

That's why correctness is a bad term. The purpose of the computation 
after all is not correctness, it is to get the right results. Some 
notion of correctness (which always must be with respect to a 
specification, so you cannot just ignore the language standard, but 
rather you need to formally supplement it) may be a means to this 
purpose, but it is not the purpose itself.

I think it is quite useful to separate the four criteria here, because 
different constituencies have quite different views. A useful compiler 
should indeed be able to accomodate all four views, and various useful
balances between them.

Basically you can't have it both ways. Either you are trying to be 
pragmatic here, in which case the formal notion of correctness is not a 
useful focus, or you are trying to be formal and precise, and you are 
interested in a formal notion of correctness. But for that you have to 
be a language lawyer, in the sense that correctness can only be shown 
with respect to a formal definition, interpreted formally. Yes, the 
language definition is often inadequate and must be supplemented, for
example with a schema mapping language semantics to IEEE semantics,
but once supplemented, then you definitely want to be in full language 
lawyer mode to make sure that the implementation corresponds to this scheme.

Again I refer you to Sam Figueroa's thesis for a much more extensive 
discussion of these issues.





^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  1:27             ` Robert Dewar
@ 2003-12-04  1:49               ` Gabriel Dos Reis
  2003-12-04  7:33                 ` Robert Dewar
  2003-12-04 21:03               ` Toon Moene
  1 sibling, 1 reply; 66+ messages in thread
From: Gabriel Dos Reis @ 2003-12-04  1:49 UTC (permalink / raw)
  To: gcc

Robert Dewar <dewar@gnat.com> writes:

| Gabriel Dos Reis wrote:
| 
| > Robert Dewar <dewar@gnat.com> writes:
| > | Scott Robert Ladd wrote:
| > | | > And then you have the IEEE-754/IEC-60559 purists who insistent
| > that
| > | > using the 80-bit registers is a platform-specific optimization that
| > | > destroys reproducibility.
| > | | Yes, and they are right but:
| > | | a) there is nothing in most language standards that requires
| > | reproducibility.
| > And that is a language lawyer point of the view that they usually
| > think completely useless for their tasks.  At some point, I think they
| > are right.

[...]

| Indeed if someone took that attitude that optimization must never affect
| floating-point accuracy, I would say that *this* is the language-lawyer
| position that ignores pragmatic reality!

I'm talking of the attitude that rejects outright reproducibility on
the ground that there is nothing in most language standards that requires
reproducibility.  In fact, nothing in most language standards requires
useful implementations.  So should usefulness be rejected outright?  
The answer should be no.  Usefulness includes many things, and depends
on the application.  If an application requires some degree of
reproducibility, it would be quite pointless to argue that nothing in
most language  standards require reproducibility.

| We are not talking minor slow downs here, but slow downs by a
| significant factor (which can easily be 2 or 3). Many users of fpt
| cannot even consider this level of inefficiency.

I know.  There are many and many. Many users of fpt also can't
understand why someone would value speed over correctness when
correctness is the purpose of the computation.

-- Gaby

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  0:35   ` Jan Hubicka
@ 2003-12-04  1:27     ` Mike Stump
  2003-12-04 18:40       ` Joe Buck
  0 siblings, 1 reply; 66+ messages in thread
From: Mike Stump @ 2003-12-04  1:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Scott Robert Ladd, gcc mailing list

On Wednesday, December 3, 2003, at 04:14 PM, Jan Hubicka wrote:
> So that way of thinking leaves us with discussion on whether we 
> actually
> needs multiple levels on size/speed settings..

As one of the people concerned with compilation speed, -O0 is fine.  We 
want it to mean, go fast, easy to debug, no optimizations that slow 
compile time, but can include optimizations that improve compile time 
that are safe, well tested and robust. With -O2, we don't care about 
compilation speed, well, we do, but not as much as we do about -O0.

So, unless someone wants -O0 to not include some optimizations, I think 
status quo is fine.  From most of what I'ev seen to date on the list, 
it would like having more optimizations (from a very limited set of all 
optimizations) at -O0 would be fine.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-04  1:17           ` Gabriel Dos Reis
@ 2003-12-04  1:27             ` Robert Dewar
  2003-12-04  1:49               ` Gabriel Dos Reis
  2003-12-04 21:03               ` Toon Moene
  0 siblings, 2 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-04  1:27 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: gcc

Gabriel Dos Reis wrote:

> Robert Dewar <dewar@gnat.com> writes:
> 
> | Scott Robert Ladd wrote:
> | 
> | > And then you have the IEEE-754/IEC-60559 purists who insistent that
> | > using the 80-bit registers is a platform-specific optimization that
> | > destroys reproducibility.
> | 
> | Yes, and they are right but:
> | 
> | a) there is nothing in most language standards that requires
> | reproducibility.
> 
> And that is a language lawyer point of the view that they usually
> think completely useless for their tasks.  At some point, I think they
> are right.

That may sound reasonable, but a requirement of complete reproducibility 
of results for floating-point accuracy would cripple efficiency for the
great majority of programmers who do not care if they get extra 
precision or especially range. It is the range that is hard. Very few
programs *rely* on overflow behavior, and slowing down programs by a
large factor just so that overflow is consistent would seem absurd to
most real life programmers using floating-point.

So this is not at all a language lawyer point of view, but a very 
pragmatic point of view that recognizes that efficiency and purity
are often at odds.

In this case, we are talking about languages which don't define the 
precise results from fpt anyway. In such circumstances, you have to 
regard the results as somewhat non-deterministic.

Ada in fact precisely characterizes the magnitude of allowed
non-deterministic behavior.

Note that in general, we do not expect program behavior to be 
independent of optimization level. If we write a program construct that 
is undefined, or defined to be non-deterministic, then different 
compilers can give different results, different versions of the same
compiler can give different results, and different optimization levels 
can give different results.

Indeed if someone took that attitude that optimization must never affect
floating-point accuracy, I would say that *this* is the language-lawyer
position that ignores pragmatic reality!

We are not talking minor slow downs here, but slow downs by a 
significant factor (which can easily be 2 or 3). Many users of fpt 
cannot even consider this level of inefficiency.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:53         ` Robert Dewar
@ 2003-12-04  1:17           ` Gabriel Dos Reis
  2003-12-04  1:27             ` Robert Dewar
  0 siblings, 1 reply; 66+ messages in thread
From: Gabriel Dos Reis @ 2003-12-04  1:17 UTC (permalink / raw)
  To: gcc

Robert Dewar <dewar@gnat.com> writes:

| Scott Robert Ladd wrote:
| 
| > And then you have the IEEE-754/IEC-60559 purists who insistent that
| > using the 80-bit registers is a platform-specific optimization that
| > destroys reproducibility.
| 
| Yes, and they are right but:
| 
| a) there is nothing in most language standards that requires
| reproducibility.

And that is a language lawyer point of the view that they usually
think completely useless for their tasks.  At some point, I think they
are right.

-- Gaby

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:12 ` Scott Robert Ladd
                     ` (2 preceding siblings ...)
  2003-12-03 16:02   ` Paul Jarc
@ 2003-12-04  0:35   ` Jan Hubicka
  2003-12-04  1:27     ` Mike Stump
  3 siblings, 1 reply; 66+ messages in thread
From: Jan Hubicka @ 2003-12-04  0:35 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list

> Richard Kenner wrote:
> >I don't see why it would be any worse than all the -f options we
> >currently have.
> 
> I agree.
> 
> A thought experiment: Allow a complexity of switches for those of us
> that want to fine-tune code, while implementing "general purpose"
> switches (e.g., -Os/-Ospeed) for people who are less complexity in their
> use of GCC.
> 
> Based on personal experience and the conversation herein, optimization
> appears to be a five-dimensional space with these axis:
> 
>   * compilation speed (-Ospeed)
>   * debugability      (-Odebug)
>   * code size	      (-Osize)
>   * execution speed   (-Ofast)
>   * FP accuracy       (-Oaccuracy)
> 
> The switches would be weighted; for example, specifying -Osize=0
> -Ospeed=2 -Oaccuracy=3 would request the highest-accuracy, with speed as
> a secondary consideration, and no concern for code size.
> 
> Hmmm... I'm still not certain if that's too complicated. for right now,
> I'm just tossing out ideas; working out the interactions of the axes
> would be the next step, if the model makes sense.

Too many choices will give us no way to actually check that the switches
are behaving as expected.  We already do have enought suprises where -Os
produces faster code than -O2 and -O2 smaller than -Os.  In order to
make the settings usefull, I think we need periodic perfomrance, code
size and compilation time testing of each switch, so there needs to be
relatively few of them.

In order to reduce the amount of possible settings, I think we can keep
-Ospeed as -O1.  There are not very many optimization choices that
differ radically for size/speed and are fast enough for -O1.  Similarly
-Osize/-Ofast shall be exclusive (it does not much sense to specify
things like speed is 3 times as importnat to me than size and expect
compiler to get it in a way you expect it to behave).
For -Oaccuracy I think current -ffast-math is doing good job.  It gives
user feeling "hey I am doing something dangerous but cool" and if
someone needs something more specific he needs to understand what so he
needs to dig into -f switches dealing with mathematics.

So that way of thinking leaves us with discussion on whether we actually
needs multiple levels on size/speed settings.. 

Just my 2 cents...
Honza
> 
> -- 
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-03 21:35 Stephan T. Lavavej
  0 siblings, 0 replies; 66+ messages in thread
From: Stephan T. Lavavej @ 2003-12-03 21:35 UTC (permalink / raw)
  To: gcc

[Felix Lee]
> I think saying what direction you're going is less confusing
> than saying what axis you're moving on.  "quickly", "small",
> "fast" instead of "speed", "size", "time".  use words that
> will fit the sentence, "I want my code ____".

I disagree strongly with making the optimization options even more
complicated than they are now; O0/O1/O2/O3/Os is plenty already.

But, /if/ gcc were to take the confusing and complicated route, I suggest
that the "increase compilation speed" option be called -Onow.  :->

Stephan T. Lavavej
http://nuwen.net



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-03 17:26 Nathanael Nerode
  0 siblings, 0 replies; 66+ messages in thread
From: Nathanael Nerode @ 2003-12-03 17:26 UTC (permalink / raw)
  To: zack, gcc

Zack wrote:
>My suggested constellation of -O switches:
>
>-O0    No optimization whatsoever.  
>       Except maybe do obviously-dead code elimination.
>-O1    Optimize, but speed of compilation is more important than
>       speed or size of generated code.  Possibly this, not -O0,
>       should be the default mode.
>
>-O3/-Ospeed
>       Optimize for speed at the expense of size.
>-Os/-Osize
>       Optimize for size at the expense of speed.
>-O2/-Obalanced
>       Produce a balance of speed and size optimizations acceptable
>       for most code.
I must agree with these choices.  :-)

>Two factors that are *not* considered in any of these switches are
>ease of debugging, and scope (function/unit/program) of optimization.
>I do not think it is appropriate to exclude optimizations from any
>level just because they mess up debugging info,
I slightly disagree.  I think optimizations which mess up debugging info
should be excluded from -O0 always.  One of the primary purposes of "not
optimizing" is making debugging easier.  I think messing up debugging info
should be allowed for all other optimization levels, though.

> and scope of
>optimization is a detail that shouldn't be exposed at the level of
>these switches.  If it makes sense in terms of the speed/size/compile
>speed tradeoffs to do whole-program optimization at -O1 then we should
>do it at -O1.  We can have -f switches for that.
I agree.

-- 
Nathanael Nerode  <neroden at gcc.gnu.org>
http://home.twcny.rr.com/nerode/neroden/fdl.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 17:03           ` Ian Lance Taylor
@ 2003-12-03 17:16             ` Robert Dewar
  0 siblings, 0 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 17:16 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Scott Robert Ladd, gcc mailing list

Ian Lance Taylor wrote:

> No, I do mean what I am saying.  Of course C doesn't give you any
> useful guarantees (I haven't looked at the latest standard).

You should :-)

> But you
> can prove the algorithm in the abstract.  Then you can use gcc
> -fno-fast-math -ffloat-store and get what you want--you don't even
> need -ffloat-store if you are using software floating point.

Of course you still need to understand the C rules for intermediate
values ... and you are still talking in a very C specific universe.
GCC is not only a C compiler :-)

> So to me
> accurate floating point means -fno-fast-math -ffloat-store

Well you can use words anyway you want, but this is an odd definition 
for most people, so you had better always define what you mean :-)

> But you are quite right that that is not what most people want, and
> probably not what many people would expect.

Right, such as those folks interested in performance, of whom there are
one or two in the fpt world :-)

Also note that your description of what you are doing is *precisely* 
what I referred to in my message. You are not relying only on the IEEE
standard, but on an auxiliary "standard" which is the behavior of GCC
when compiling C using -fno-fast-math -ffloat-store. That was my point.
You can't just say you are using the IEEE standard, you must always add
the assumptions you are making about the translation of the high level 
language you are using.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:42         ` Robert Dewar
@ 2003-12-03 17:03           ` Ian Lance Taylor
  2003-12-03 17:16             ` Robert Dewar
  0 siblings, 1 reply; 66+ messages in thread
From: Ian Lance Taylor @ 2003-12-03 17:03 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Scott Robert Ladd, gcc mailing list

Robert Dewar <dewar@gnat.com> writes:

> Ian Lance Taylor wrote:
> 
> > Granted.  I was thinking of the ability to prove algorithm
> > characteristics based on the IEEE spec, which is something I've done
> > in the past.  To me that is accurate floating point.  But I can see
> > that others may think that accurate floating point means something
> > else.
> 
> You don't really mean what you are saying here. The IEEE spec is a
> hardware/software spec about the behavior of certain floating-point
> operations. It has hardly anything to say about the relationship
> between these operations and semantics of high level languages.

No, I do mean what I am saying.  Of course C doesn't give you any
useful guarantees (I haven't looked at the latest standard).  But you
can prove the algorithm in the abstract.  Then you can use gcc
-fno-fast-math -ffloat-store and get what you want--you don't even
need -ffloat-store if you are using software floating point.  So to me
accurate floating point means -fno-fast-math -ffloat-store.

But you are quite right that that is not what most people want, and
probably not what many people would expect.

Ian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:32       ` Scott Robert Ladd
@ 2003-12-03 16:53         ` Robert Dewar
  2003-12-04  1:17           ` Gabriel Dos Reis
  0 siblings, 1 reply; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 16:53 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Ian Lance Taylor, gcc mailing list

Scott Robert Ladd wrote:

> And then you have the IEEE-754/IEC-60559 purists who insistent that 
> using the 80-bit registers is a platform-specific optimization that 
> destroys reproducibility.

Yes, and they are right but:

a) there is nothing in most language standards that requires 
reproducibility. Even the IEEE standard is not 100% deterministic (issue 
of double rounding for denormals, and accuracy of some operations like
conversions).

b) Sticking to precise 32-bit or 64-bit IEEE semantics, *including* 
range checks is prohibitively expensive, as we have found out in the 
Java world. You can rant and rave at Intel for this, but the practical 
state of things is that no language standard is going to mandate a 
behavior that makes it impractical to implement on x86 chips, and 
furthermore if a language standard attempts to do so, it will be ignored 
(again, note our experience with Java here).

So if reproducibility includes getting infinities for overflows, forget 
about reproductibility from a practical point of view if you have any
concerns about efficiency at all.

Yes, the purists are upset, they would like all x86 chips to vaporize.
I think that unlikely to happen :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:02   ` Paul Jarc
@ 2003-12-03 16:46     ` Felix Lee
  0 siblings, 0 replies; 66+ messages in thread
From: Felix Lee @ 2003-12-03 16:46 UTC (permalink / raw)
  To: gcc mailing list

prj@po.cwru.edu (Paul Jarc):
> Scott Robert Ladd <coyote@coyotegulch.com> wrote:
> >    * compilation speed (-Ospeed)
> I think "time" might be less confusing than "speed".

I think saying what direction you're going is less confusing than
saying what axis you're moving on.  "quickly", "small", "fast"
instead of "speed", "size", "time".  use words that will fit the
sentence, "I want my code ____".
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:26       ` Ian Lance Taylor
@ 2003-12-03 16:42         ` Robert Dewar
  2003-12-03 17:03           ` Ian Lance Taylor
  0 siblings, 1 reply; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 16:42 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Scott Robert Ladd, gcc mailing list

Ian Lance Taylor wrote:

> Granted.  I was thinking of the ability to prove algorithm
> characteristics based on the IEEE spec, which is something I've done
> in the past.  To me that is accurate floating point.  But I can see
> that others may think that accurate floating point means something
> else.

You don't really mean what you are saying here. The IEEE spec is a 
hardware/software spec about the behavior of certain floating-point 
operations. It has hardly anything to say about the relationship between 
these operations and semantics of high level languages.

So what you were really doing, assuming you were not working in machine 
language :-) was to prove some properties under some set of assumptions 
about mapping of language operations to IEEE operations. Yes, I know 
that the latest C standard does have something to say about that, so you 
could also be referring to that standard, but you can't just say you are 
referring to the IEEE standard and let it go at that.

So you can't just say "accurate floating-point means IEEE". That's not
good enough, you have to further specify what you expect in terms of the
semantics of the language you are using.

For a rather complete discussion of these issues see Sam Figueroa's 
thesis: http://www.cs.nyu.edu/csweb/Research/Theses/figueroa_sam.pdf
("A Rigorous Framework for Fully Supporting the IEEE Standard for 
Floating-Point Arithmetic in High-Level Programming Languages").


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:45 Richard Kenner
  2003-12-03 16:03 ` Scott Robert Ladd
  2003-12-03 16:19 ` Robert Dewar
@ 2003-12-03 16:38 ` Felix Lee
  2 siblings, 0 replies; 66+ messages in thread
From: Felix Lee @ 2003-12-03 16:38 UTC (permalink / raw)
  To: gcc

kenner@vlsi1.ultra.nyu.edu (Richard Kenner):
> Right now, I count about 20 different optimizations switches and
> 2**20 is significantly larger than 3**5!

but the -f switches are at a different conceptual level than -O
switches.  each -f switch can be tested independently as to
whether they do the thing promised or not.  like -funroll-loops
is a statement that loops will be unrolled, and there's no
promise being made that the result will have any particular
quality other than "loops are unrolled".

the -O switches are statements about more abstract qualities, and
if combinations of them are meaningful, they all should be
checked.  like, "-Ofast=3 -Osmall=2" should not be slower but can
be larger than "-Ofast=2 -Osmall=3", etc.

(checking whether combinations of -f switches produce correct
code is a separate issue, independent of the problem of checking
whether combinations of -O switches are meaningful.)
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:12     ` Robert Dewar
  2003-12-03 16:26       ` Ian Lance Taylor
@ 2003-12-03 16:32       ` Scott Robert Ladd
  2003-12-03 16:53         ` Robert Dewar
  1 sibling, 1 reply; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-03 16:32 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Ian Lance Taylor, gcc mailing list

Robert Dewar wrote:
> You make the mistake of assuming that accurate floating-point is 
> well-defined. That's far from the case. There are many possible 
> interpretations.

> Indeed often the issue is that optimization *increases* accuracy, for 
> example by using 80-bit intermediate results on the x86.

And then you have the IEEE-754/IEC-60559 purists who insistent that 
using the 80-bit registers is a platform-specific optimization that 
destroys reproducibility.

> Different motivations here are:
> 
> 1. Keep the results the same when optimized
> 
> 2. Don't worry too much about precise fpt semantics, do things fast
> 
> 3. Optimize as much as you can, but respecting language semantics (for
> example Ada specifically allows the extra precision mentioned above).

Different programming languages -- even different *versions* of the same 
language -- may differ in their requirements for floating-point.

> Nothing is simple in this area :-)

Tell me about it! I'm trying to develop an accuracy benchmark, after 
perusing what's available and finding nothing truly satisfactory. Even 
Kahan's old paranoia benchmark is lacking in many ways.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 16:12     ` Robert Dewar
@ 2003-12-03 16:26       ` Ian Lance Taylor
  2003-12-03 16:42         ` Robert Dewar
  2003-12-03 16:32       ` Scott Robert Ladd
  1 sibling, 1 reply; 66+ messages in thread
From: Ian Lance Taylor @ 2003-12-03 16:26 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Scott Robert Ladd, gcc mailing list

Robert Dewar <dewar@gnat.com> writes:

>   > Would we really want -Oaccuracy to range from 0 to 3?  It seems to me
> > that you either care about accurate floating point, or you do not.
> > It's hard for me to see why anybody would set -Oaccuracy to anything
> > other than 0 or 3.  Even if they did, it's hard for me to see how they
> > could possibly understand what was going to happen.
> 
> You make the mistake of assuming that accurate floating-point is
> well-defined. That's far from the case. There are many possible
> interpretations.

Granted.  I was thinking of the ability to prove algorithm
characteristics based on the IEEE spec, which is something I've done
in the past.  To me that is accurate floating point.  But I can see
that others may think that accurate floating point means something
else.

> Indeed often the issue is that optimization *increases* accuracy, for
> example by using 80-bit intermediate results on the x86.
> 
> Different motivations here are:
> 
> 1. Keep the results the same when optimized
> 
> 2. Don't worry too much about precise fpt semantics, do things fast
> 
> 3. Optimize as much as you can, but respecting language semantics (for
> example Ada specifically allows the extra precision mentioned above).

I think this indicates that -Oaccuracy doesn't make sense.  I also
think you left out the option which matters for C/C++ numeric
programming, which is strict IEEE semantics even though the language
does not require them.

Ian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:45 Richard Kenner
  2003-12-03 16:03 ` Scott Robert Ladd
@ 2003-12-03 16:19 ` Robert Dewar
  2003-12-03 16:38 ` Felix Lee
  2 siblings, 0 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 16:19 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

Richard Kenner wrote:

>     Well as part of this thought experiment, please consider that if each
>     of these five axes has three settings, then there are 3**5
>     combinations, which is a large number, and of course there is no
>     possibility of the compiler behaving in 3**5 significantly different ways.
> 
> Right now, I count about 20 different optimizations switches and
> 2**20 is significantly larger than 3**5!

That's not the right comparison. If you go this route, then you need a
manual that for each of the 243 combinations of optimziation swithces
tells you which of these 2**20 possibilities applies to that combination.

Sounds like a mess to me (actually I count more than 20 optimziation 
options :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:44   ` Ian Lance Taylor
  2003-12-03 15:59     ` Scott Robert Ladd
@ 2003-12-03 16:12     ` Robert Dewar
  2003-12-03 16:26       ` Ian Lance Taylor
  2003-12-03 16:32       ` Scott Robert Ladd
  1 sibling, 2 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 16:12 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Scott Robert Ladd, gcc mailing list

  > Would we really want -Oaccuracy to range from 0 to 3?  It seems to me
> that you either care about accurate floating point, or you do not.
> It's hard for me to see why anybody would set -Oaccuracy to anything
> other than 0 or 3.  Even if they did, it's hard for me to see how they
> could possibly understand what was going to happen.

You make the mistake of assuming that accurate floating-point is 
well-defined. That's far from the case. There are many possible 
interpretations.

Indeed often the issue is that optimization *increases* accuracy, for 
example by using 80-bit intermediate results on the x86.

Different motivations here are:

1. Keep the results the same when optimized

2. Don't worry too much about precise fpt semantics, do things fast

3. Optimize as much as you can, but respecting language semantics (for
example Ada specifically allows the extra precision mentioned above).

Nothing is simple in this area :-)



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:45 Richard Kenner
@ 2003-12-03 16:03 ` Scott Robert Ladd
  2003-12-03 16:19 ` Robert Dewar
  2003-12-03 16:38 ` Felix Lee
  2 siblings, 0 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-03 16:03 UTC (permalink / raw)
  To: Richard Kenner; +Cc: dewar, gcc

Richard Kenner wrote:
>     Well as part of this thought experiment, please consider that if each
>     of these five axes has three settings, then there are 3**5
>     combinations, which is a large number, and of course there is no
>     possibility of the compiler behaving in 3**5 significantly different ways.
> 
> Right now, I count about 20 different optimizations switches and
> 2**20 is significantly larger than 3**5!
> 

And As I have pointed out in my Acovea article, my second-generation 
tests were performed with 64 different options, some of which had more 
than one state (e.g., finline-limit, -mfpmath). Even assuming a simple 
on/off setting and no changes in parameterized options (-finline-limit), 
that's 2**63, or 9,223,372,036,854,775,808 combinations.

Which is why I wrote Acovea in the first place. I'm refining and 
expanding the algorithm to search for code size and accuracy in addition 
to generated code speed; results are still computing. I suspect it is 
more likely to find a happy medium for general options than it is to 
define scaling for several optimization axes.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:12 ` Scott Robert Ladd
  2003-12-03 15:27   ` Robert Dewar
  2003-12-03 15:44   ` Ian Lance Taylor
@ 2003-12-03 16:02   ` Paul Jarc
  2003-12-03 16:46     ` Felix Lee
  2003-12-04  0:35   ` Jan Hubicka
  3 siblings, 1 reply; 66+ messages in thread
From: Paul Jarc @ 2003-12-03 16:02 UTC (permalink / raw)
  To: gcc mailing list

Scott Robert Ladd <coyote@coyotegulch.com> wrote:
>    * compilation speed (-Ospeed)
>    * debugability      (-Odebug)
>    * code size	      (-Osize)
>    * execution speed   (-Ofast)
>    * FP accuracy       (-Oaccuracy)
>
> The switches would be weighted; for example, specifying -Osize=0
> -Ospeed=2 -Oaccuracy=3 would request the highest-accuracy, with speed as
> a secondary consideration, and no concern for code size.

I think "time" might be less confusing than "speed".

Do we really need to weight the dimensions, or is it enough to rank
them?  If we can say "accuracy is more important than compilation
time, which is more important than debuggability and execution speed
(which are of eqqual importance), which are more important than code
size", would that give us enough of a semantic range?


paul

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:44   ` Ian Lance Taylor
@ 2003-12-03 15:59     ` Scott Robert Ladd
  2003-12-03 16:12     ` Robert Dewar
  1 sibling, 0 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-03 15:59 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc mailing list

Ian Lance Taylor wrote:
> Would we really want -Oaccuracy to range from 0 to 3?  It seems to me
> that you either care about accurate floating point, or you do not.

Having studied the issue at lenght, I've concluded that... it's complicated.

I guess I'm lumping several related concepts under "accuracy." For 
example, a certain class of users wants absolute IEEE-754/IEC-60559 
conformancy, for consistent results across platforms; other people want 
the most accurate results for their specific platform.

> Also -Ospeed and -Ofast will be confused with each other, so one of
> them needs to be renamed.

Correct; time to get out the thesaurus, I guess -- or go with longer 
names, like -Ocompile-speed, -Ogenerated-code-speed, or 
-Omake-the-fastest-program-you-can-no-matter-what.

Some of the -f options are almost that lengthy... ;)

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-03 15:45 Richard Kenner
  2003-12-03 16:03 ` Scott Robert Ladd
                   ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Richard Kenner @ 2003-12-03 15:45 UTC (permalink / raw)
  To: dewar; +Cc: gcc

    Well as part of this thought experiment, please consider that if each
    of these five axes has three settings, then there are 3**5
    combinations, which is a large number, and of course there is no
    possibility of the compiler behaving in 3**5 significantly different ways.

Right now, I count about 20 different optimizations switches and
2**20 is significantly larger than 3**5!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:12 ` Scott Robert Ladd
  2003-12-03 15:27   ` Robert Dewar
@ 2003-12-03 15:44   ` Ian Lance Taylor
  2003-12-03 15:59     ` Scott Robert Ladd
  2003-12-03 16:12     ` Robert Dewar
  2003-12-03 16:02   ` Paul Jarc
  2003-12-04  0:35   ` Jan Hubicka
  3 siblings, 2 replies; 66+ messages in thread
From: Ian Lance Taylor @ 2003-12-03 15:44 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list

Scott Robert Ladd <coyote@coyotegulch.com> writes:

> A thought experiment: Allow a complexity of switches for those of us
> that want to fine-tune code, while implementing "general purpose"
> switches (e.g., -Os/-Ospeed) for people who are less complexity in their
> use of GCC.
> 
> Based on personal experience and the conversation herein, optimization
> appears to be a five-dimensional space with these axis:
> 
>    * compilation speed (-Ospeed)
>    * debugability      (-Odebug)
>    * code size	      (-Osize)
>    * execution speed   (-Ofast)
>    * FP accuracy       (-Oaccuracy)
> 
> The switches would be weighted; for example, specifying -Osize=0
> -Ospeed=2 -Oaccuracy=3 would request the highest-accuracy, with speed as
> a secondary consideration, and no concern for code size.

Would we really want -Oaccuracy to range from 0 to 3?  It seems to me
that you either care about accurate floating point, or you do not.
It's hard for me to see why anybody would set -Oaccuracy to anything
other than 0 or 3.  Even if they did, it's hard for me to see how they
could possibly understand what was going to happen.  The reason to use
accurate floating point is to be able to prove characteristics about
your algorithm.  If you use anything other than -Oaccuracy=3, you
would need a clear understanding of precisely what was going to
happen.  In such cases, you more or less have to use a specific
option, such as -mfused-madd.

I'm also not sure about -Ospeed.  For compilation speed, the options I
see would be ``fast as possible'' or ``don't care.''  I suppose I can
imagine a setting for ``use exponential algorithms,'' aka the
24-hour-compile option, but in general I think people would more
effectively use -f options to select the particular slow algorithms
they were interested in.

Also, I think it would be quite difficult to maintain consistency of
these options over time.  For any given optimization, there will be a
lot of knobs to check in considering whether to implement the
optimization.  Different people will make different decisions.  I
think this will tend to mute the effects.  One way to avoid this would
be to have each optimization controlled by a specific variable, and
then have some matrix in which the -O... options set the optimization
control variables.  But that would be a moderately complex approach.

Also -Ospeed and -Ofast will be confused with each other, so one of
them needs to be renamed.

Ian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:12 ` Scott Robert Ladd
@ 2003-12-03 15:27   ` Robert Dewar
  2003-12-03 15:44   ` Ian Lance Taylor
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-03 15:27 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list

Scott Robert Ladd wrote:

> Richard Kenner wrote:
> 
>> I don't see why it would be any worse than all the -f options we
>> currently have.
> 
> 
> I agree.
> 
> A thought experiment: Allow a complexity of switches for those of us
> that want to fine-tune code, while implementing "general purpose"
> switches (e.g., -Os/-Ospeed) for people who are less complexity in their
> use of GCC.
> 
> Based on personal experience and the conversation herein, optimization
> appears to be a five-dimensional space with these axis:
> 
>   * compilation speed (-Ospeed)
>   * debugability      (-Odebug)
>   * code size          (-Osize)
>   * execution speed   (-Ofast)
>   * FP accuracy       (-Oaccuracy)

Well as part of this thought experiment, please consider that if each of
these five axes has three settings, then there are 3**5 combinations,
which is a large number, and of course there is no possibility of the
compiler behaving in 3**5 significantly different ways.

THe above five axis layout is indeed helpful as a framework, but in 
practice I think we need to find a few useful points in this space and
pin them down, and then try to make the compiler useful at those points.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 15:07 Richard Kenner
@ 2003-12-03 15:12 ` Scott Robert Ladd
  2003-12-03 15:27   ` Robert Dewar
                     ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-03 15:12 UTC (permalink / raw)
  To: gcc mailing list

Richard Kenner wrote:
> I don't see why it would be any worse than all the -f options we
> currently have.

I agree.

A thought experiment: Allow a complexity of switches for those of us
that want to fine-tune code, while implementing "general purpose"
switches (e.g., -Os/-Ospeed) for people who are less complexity in their
use of GCC.

Based on personal experience and the conversation herein, optimization
appears to be a five-dimensional space with these axis:

   * compilation speed (-Ospeed)
   * debugability      (-Odebug)
   * code size	      (-Osize)
   * execution speed   (-Ofast)
   * FP accuracy       (-Oaccuracy)

The switches would be weighted; for example, specifying -Osize=0
-Ospeed=2 -Oaccuracy=3 would request the highest-accuracy, with speed as
a secondary consideration, and no concern for code size.

Hmmm... I'm still not certain if that's too complicated. for right now,
I'm just tossing out ideas; working out the interactions of the axes
would be the next step, if the model makes sense.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-03 15:07 Richard Kenner
  2003-12-03 15:12 ` Scott Robert Ladd
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Kenner @ 2003-12-03 15:07 UTC (permalink / raw)
  To: gp; +Cc: gcc

    > So, for instance, you could say
    > 
    >     gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0
 
    I'm afraid that would be a QA nightmare.

I don't see why it would be any worse than all the -f options we
currently have.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-03 11:55 ` Gerald Pfeifer
@ 2003-12-03 11:58   ` Scott A Crosby
  0 siblings, 0 replies; 66+ messages in thread
From: Scott A Crosby @ 2003-12-03 11:58 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Gareth McCaughan, gcc

On Wed, 3 Dec 2003 11:29:15 +0100 (CET), Gerald Pfeifer <gp@suse.de> writes:

> On Tue, 2 Dec 2003, Gareth McCaughan wrote:
> > There are several different axes along which the kind
> > of code a compiler generates can vary:
> > [...]
> > So, for instance, you could say
> > 
> >     gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0
>  
> I'm afraid that would be a QA nightmare.

It could have an internal implementation that is simpler. Even though
the user-visible behavior appears to have many options, the
implementation may have a small number of cases.

The CMUCL Python compiler uses this information is things like:

' when 3 > speed > safety  use approximate type checking'
' when 3 = speed > safety  remove type checking on functions'
' when space >= speed .....'
' Behavior is undefined when more than one paramater is set to 3'
' when speed=3 and compiletime = 0 ......'


Scott

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 13:01 Gareth McCaughan
  2003-12-02 14:15 ` Felix Lee
  2003-12-02 14:19 ` Scott Robert Ladd
@ 2003-12-03 11:55 ` Gerald Pfeifer
  2003-12-03 11:58   ` Scott A Crosby
  2 siblings, 1 reply; 66+ messages in thread
From: Gerald Pfeifer @ 2003-12-03 11:55 UTC (permalink / raw)
  To: Gareth McCaughan; +Cc: gcc

On Tue, 2 Dec 2003, Gareth McCaughan wrote:
> There are several different axes along which the kind
> of code a compiler generates can vary:
> [...]
> So, for instance, you could say
> 
>     gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0
 
I'm afraid that would be a QA nightmare.
 
Gerald

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 22:25                   ` Scott Robert Ladd
  2003-12-02 22:29                     ` Eric Christopher
@ 2003-12-02 22:47                     ` Ian Lance Taylor
  1 sibling, 0 replies; 66+ messages in thread
From: Ian Lance Taylor @ 2003-12-02 22:47 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc

Scott Robert Ladd <coyote@coyotegulch.com> writes:

> 4) Is it reasonable to perform different optimizations on different
> architectures? It seems to me that a PowerPC or Opteron has very
> different characteristics from a P4, but maybe I'm splitting hairs.

Yes.  Many ports already use the OPTIMIZATION_OPTIONS targe macro to
adjust specific optimizations based on the optimization level.  Also,
several ports have machine specific optimization options which may
have interesting effects on the results:
    http://gcc.gnu.org/onlinedocs/gcc-3.3.2/gcc/Submodel-Options.html#Submodel%20Options

Since you mention a Mac, for the PowerPC there are options like
-mmultiple, -mstring and -mfused-madd.

Ian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 22:25                   ` Scott Robert Ladd
@ 2003-12-02 22:29                     ` Eric Christopher
  2003-12-02 22:47                     ` Ian Lance Taylor
  1 sibling, 0 replies; 66+ messages in thread
From: Eric Christopher @ 2003-12-02 22:29 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list


> 4) Is it reasonable to perform different optimizations on different 
> architectures? It seems to me that a PowerPC or Opteron has very 
> different characteristics from a P4, but maybe I'm splitting hairs.

I like the idea, but it really hurts trying to debug something on
multiple platforms - i.e. "this shows up with -Ox", and it might not on
one platform because of a different set of optimizations instead of not
showing up because of a difference on target.

-eric

-- 
Eric Christopher <echristo@redhat.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 16:01                 ` Felix Lee
@ 2003-12-02 22:25                   ` Scott Robert Ladd
  2003-12-02 22:29                     ` Eric Christopher
  2003-12-02 22:47                     ` Ian Lance Taylor
  0 siblings, 2 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-02 22:25 UTC (permalink / raw)
  Cc: gcc mailing list

I'm working out a plan of action in terms of finding better option sets 
for the various -O levels. The comments in this thread have been most 
useful.

1) I can analyze for executable code size, compile speed, and execution 
time; I'm adding such options to Acovea.

2) Testing only on Intel chips is a severe limitation; I'm updating 
Acovea for my UltraSparc IIi system, and I have a G4 Mac coming 
available in the next week or so. In the next few days, I'll publish an 
update to Acovea, so people with Alphas and such can test on their 
boxes, if they want to.

3) In addition to the other categories, I'm also going to look at a 
-Oaccuracy option. As it stands now, options that may cause inaccuracies 
are bundled inside -ffast-math; I've seen some evidence of accuracy 
problems with other options as well, but want to confirm my anecdotal 
evidence with some real tests.

4) Is it reasonable to perform different optimizations on different 
architectures? It seems to me that a PowerPC or Opteron has very 
different characteristics from a P4, but maybe I'm splitting hairs.


-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 21:00                   ` Joe Buck
@ 2003-12-02 21:29                     ` Robert Dewar
  0 siblings, 0 replies; 66+ messages in thread
From: Robert Dewar @ 2003-12-02 21:29 UTC (permalink / raw)
  To: Joe Buck; +Cc: David Carlton, Zack Weinberg, gcc mailing list

Joe Buck wrote:

> The main purpose of -O0 is for debugging. The user of -O0 wants a
> fast compile and wants to be able to effectively debug.  Accordingly,
> it seems to me that intra-statement optimization is fine, as long as,
> at any point where the user could set a breakpoint, the state matches
> the code (generally meaning that in-memory objects have the correct
> state).  However, what the user would not want is for -O0 to get slower.

We certainly do not want any significant slow downs. However, I suspect
that some very mild attempt to remove junk code would cut down the size
noticeably, and that helps speed up things, for example the link step.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 19:01                 ` Robert Dewar
@ 2003-12-02 21:00                   ` Joe Buck
  2003-12-02 21:29                     ` Robert Dewar
  0 siblings, 1 reply; 66+ messages in thread
From: Joe Buck @ 2003-12-02 21:00 UTC (permalink / raw)
  To: Robert Dewar; +Cc: David Carlton, Zack Weinberg, gcc mailing list

On Tue, Dec 02, 2003 at 01:54:01PM -0500, Robert Dewar wrote:
> One real problem with gcc is that -O0 is too painfully stupid. It 
> generates piles of junk code.

The main purpose of -O0 is for debugging. The user of -O0 wants a
fast compile and wants to be able to effectively debug.  Accordingly,
it seems to me that intra-statement optimization is fine, as long as,
at any point where the user could set a breakpoint, the state matches
the code (generally meaning that in-memory objects have the correct
state).  However, what the user would not want is for -O0 to get slower.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 20:15                 ` Geoff Keating
@ 2003-12-02 20:36                   ` David Carlton
  0 siblings, 0 replies; 66+ messages in thread
From: David Carlton @ 2003-12-02 20:36 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc mailing list

On 02 Dec 2003 11:48:30 -0800, Geoff Keating <geoffk@geoffk.org> said:
> David Carlton <carlton@kealia.com> writes:

>> This is obviously a very special case, but dead code elimination
>> sometimes makes it difficult to write tests for GDB's test suite.  And
>> even when working on real programs I occasionally insert dead code as
>> a place where I can set breakpoints.  So, personally, I'd prefer that
>> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

> I thought that the dead code elimination at -O0 is supposed to work
> properly with GDB: it puts in a 'nop' that the breakpoint can target.
> Is this not working?

Not always.  I went and dug up the original thread, and trimmed the
resulting program a little bit.  Compile the program after my
signature (using gcc -g -O0; I've seen this on stock GCC 3.1 and on
Red Hat's 3.2, though I haven't tried more recent GCC's), and run gdb
on it, and do 'break main' and then 'run': it skips everything until
the call to 'wack_struct_1'.  And looking at the assembly code, 'main'
is a lot shorter than I would wish.

David Carlton
carlton@kealia.com

/* Check that GDB can correctly update a value, living in a register,
   in the target.  This pretty much relies on the compiler taking heed
   of requests for values to be stored in registers.  */

static char
add_char (register char u, register char v)
{
  return u + v;
}

static short
add_short (register short u, register short v)
{
  return u + v;
}

static int
add_int (register int u, register int v)
{
  return u + v;
}

static long
add_long (register long u, register long v)
{
  return u + v;
}

static float
add_float (register float u, register float v)
{
  return u + v;
}

static double
add_double (register double u, register double v)
{
  return u + v;
}

/* */

static char
wack_char (register char u, register char v)
{
  register char l = u;
  l = add_char (l, v);
  return l;
}

static short
wack_short (register short u, register short v)
{
  register short l = u;
  l = add_short (l, v);
  return l;
}

static int
wack_int (register int u, register int v)
{
  register int l = u;
  l = add_int (l, v);
  return l;
}

static long
wack_long (register long u, register long v)
{
  register long l = u;
  l = add_long (l, v);
  return l;
}

static float
wack_float (register float u, register float v)
{
  register float l = u;
  l = add_float (l, v);
  return l;
}

static double
wack_double (register double u, register double v)
{
  register double l = u;
  l = add_double (l, v);
  return l;
}

struct s_1 { short s[1]; } z_1, s_1;

static struct s_1
add_struct_1 (struct s_1 s)
{
  int i;
  for (i = 0; i < sizeof (s) / sizeof (s.s[0]); i++)
    {
      s.s[i] = s.s[i] + s.s[i];
    }
  return s;
}

static struct s_1
wack_struct_1 (void)
{
  int i; register struct s_1 u = z_1;
  for (i = 0; i < sizeof (s_1) / sizeof (s_1.s[0]); i++) { s_1.s[i] = i + 1; }
  u = add_struct_1 (u);
  return u;
}

int
main ()
{
  /* These calls are for current frame test.  */
  wack_char (1, 2);
  wack_short (1, 2);
  wack_int (1, 2);
  wack_long (1, 2);
  wack_float (1, 2);
  wack_double (1, 2);

  /* These calls are for up frame.  */
  wack_char (1, 2);
  wack_short (1, 2);
  wack_int (1, 2);
  wack_long (1, 2);
  wack_float (1, 2);
  wack_double (1, 2);

  /* These calls are for current frame test.  */
  wack_struct_1 ();

  return 0;
}

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 17:09               ` David Carlton
  2003-12-02 17:23                 ` Zack Weinberg
  2003-12-02 19:01                 ` Robert Dewar
@ 2003-12-02 20:15                 ` Geoff Keating
  2003-12-02 20:36                   ` David Carlton
  2 siblings, 1 reply; 66+ messages in thread
From: Geoff Keating @ 2003-12-02 20:15 UTC (permalink / raw)
  To: David Carlton; +Cc: gcc mailing list

David Carlton <carlton@kealia.com> writes:

> On Tue, 02 Dec 2003 00:26:07 -0800, "Zack Weinberg"
> <zack@codesourcery.com> said:
> 
> > -O0    No optimization whatsoever.  
> >        Except maybe do obviously-dead code elimination.
> 
> This is obviously a very special case, but dead code elimination
> sometimes makes it difficult to write tests for GDB's test suite.  And
> even when working on real programs I occasionally insert dead code as
> a place where I can set breakpoints.  So, personally, I'd prefer that
> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

I thought that the dead code elimination at -O0 is supposed to work
properly with GDB: it puts in a 'nop' that the breakpoint can target.
Is this not working?

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 17:09               ` David Carlton
  2003-12-02 17:23                 ` Zack Weinberg
@ 2003-12-02 19:01                 ` Robert Dewar
  2003-12-02 21:00                   ` Joe Buck
  2003-12-02 20:15                 ` Geoff Keating
  2 siblings, 1 reply; 66+ messages in thread
From: Robert Dewar @ 2003-12-02 19:01 UTC (permalink / raw)
  To: David Carlton; +Cc: Zack Weinberg, gcc mailing list

David Carlton wrote:

> This is obviously a very special case, but dead code elimination
> sometimes makes it difficult to write tests for GDB's test suite.  And
> even when working on real programs I occasionally insert dead code as
> a place where I can set breakpoints.  So, personally, I'd prefer that
> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

I don't see that convenience of the GDB test suite should be a 
determining factor.

One real problem with gcc is that -O0 is too painfully stupid. It 
generates piles of junk code.

This has three downsides:

1. In some environments, it is a policy not to use any optimization 
switches (typically this is because of bad previous experience with 
various optimizing compilers). This means that gcc gets compared to
other compilers in "non-optimizing" mode and in such comparisons, gcc
does poorly. Yes, this is unfair, but life is unfair, and you still
end up loosing a potential gcc user.

2. Simple minded compiler benchmarks are often done with default 
settings, and the performance of gcc in such settings is often
poor. This is related to 1, and is the same issue technically but
is a little different from a documentation and marketing point of
view.

3. With the current state of technology, you really can't do very much
with GDB at other than -O0. Yes, hackers who know what is going on can
manage to debug at -O1 or even -O2, and do all the time -- I usually am
running gdb on optimized code and I learn what to trust and not to 
trust. But users who actually expect the debugger to work, and who 
operate at a high semantic level can find the behavior quite 
unacceptable at -O1 and -O2. So this drives the use of unoptimized
code for debugging, but this can lead to painfully big (hundreds of
megabytes) images which are unwieldy.

We have talked before of something like -Od, which would say do all the
optimization you can without compromising debuggability. This could very
well be the default, then -O0 could *really* deoptimize including not
deleting dead code, and any other horrible things we can think of :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 17:23                 ` Zack Weinberg
  2003-12-02 17:31                   ` David Edelsohn
@ 2003-12-02 17:44                   ` David Carlton
  1 sibling, 0 replies; 66+ messages in thread
From: David Carlton @ 2003-12-02 17:44 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc mailing list

On Tue, 02 Dec 2003 09:18:29 -0800, "Zack Weinberg" <zack@codesourcery.com> said:
> David Carlton <carlton@kealia.com> writes:
>> On Tue, 02 Dec 2003 00:26:07 -0800, "Zack Weinberg"
>> <zack@codesourcery.com> said:

>>> -O0    No optimization whatsoever.  
>>> Except maybe do obviously-dead code elimination.
>> 
>> This is obviously a very special case, but dead code elimination
>> sometimes makes it difficult to write tests for GDB's test suite.  And
>> even when working on real programs I occasionally insert dead code as
>> a place where I can set breakpoints.  So, personally, I'd prefer that
>> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

> That's a good point.  We seem to be trending in the direction of
> having -O0 do a little bit of optimization, not much, is the main
> reason I threw that in.

> I think I meant unreachable code, not dead code; does that change your
> opinion?

Eliminating unreachable code should be fine.  I went and reminded
myself of the specific problems that we'd run into in the GDB test
suite, and they involved situations like this:

static int add (int a, int b) {
  return a + b;
}

int main () {
  add (1, 2);
}

The call to 'add' was being optimized out even at -O0.  Which would be
fine 99% of the time, but sometimes it's a little bit annoying.
Fortunately, if you remove the word 'static', then the compiler left
in the call to 'add', so there was a simple work-around in this case.

>> One current side effect of optimization is that it enables lots of
>> warnings (e.g. unitialized variable detection); if we're going to list
>> explicit goals for different optimization levels, I would have that be
>> a goal for -O1 (and of course for higher optimization levels).

> One of the reasons I suggested -O1 be default is so that users would
> get the warnings that require flow information with just -Wall.  I've
> never liked that the set of warnings issued depends on the
> optimization level.

Makes sense to me.

>>> I do not think it is appropriate to exclude optimizations from any
>>> level just because they mess up debugging info

>> I disagree with this for -O0.

> Well, -O0 shouldn't be doing any of those, just because it isn't
> supposed to be doing optimizations in general, but could you explain
> your opinion a little more?  Or is this just reiterating what you
> said above?

It seemed like you were saying that -O0 could do things that would
reduce debuggability; I do know users who have a hard running
debuggers on code after instructions have gotten reordered or calls
have disappeared or something like that.  So I think there's real
value in having a mode that preserves maximum debuggability; if doing
so increases the complexity of the compiler and imposes a real
maintenance burden, that could be a reason to not have such a mode,
but it would make some people unhappy.

David Carlton
carlton@kealia.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 17:23                 ` Zack Weinberg
@ 2003-12-02 17:31                   ` David Edelsohn
  2003-12-02 17:44                   ` David Carlton
  1 sibling, 0 replies; 66+ messages in thread
From: David Edelsohn @ 2003-12-02 17:31 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: David Carlton, gcc mailing list

>>>>> Zack Weinberg writes:

>>> -O0    No optimization whatsoever.  
>>> Except maybe do obviously-dead code elimination.
>> 
>> This is obviously a very special case, but dead code elimination
>> sometimes makes it difficult to write tests for GDB's test suite.  And
>> even when working on real programs I occasionally insert dead code as
>> a place where I can set breakpoints.  So, personally, I'd prefer that
>> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

Zack> That's a good point.  We seem to be trending in the direction of
Zack> having -O0 do a little bit of optimization, not much, is the main
Zack> reason I threw that in.

Zack> I think I meant unreachable code, not dead code; does that change your
Zack> opinion?

	Users normally want a fast write-compile-debug cycle at -O0.
Performing some minimal optimizations removing useless code reduces the
number of instructions that the compiler needs to process -- even writing
out the file -- which speeds up the compilation.  Most proprietary
compilers perform some optimization at -O0.  If you really want an
optimization level that doesn't touch the code at all, probably at the
expense of even slower compilation speed, add another level below -O0.

David

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 17:09               ` David Carlton
@ 2003-12-02 17:23                 ` Zack Weinberg
  2003-12-02 17:31                   ` David Edelsohn
  2003-12-02 17:44                   ` David Carlton
  2003-12-02 19:01                 ` Robert Dewar
  2003-12-02 20:15                 ` Geoff Keating
  2 siblings, 2 replies; 66+ messages in thread
From: Zack Weinberg @ 2003-12-02 17:23 UTC (permalink / raw)
  To: David Carlton; +Cc: gcc mailing list

David Carlton <carlton@kealia.com> writes:

> On Tue, 02 Dec 2003 00:26:07 -0800, "Zack Weinberg"
> <zack@codesourcery.com> said:
>
>> -O0    No optimization whatsoever.  
>>        Except maybe do obviously-dead code elimination.
>
> This is obviously a very special case, but dead code elimination
> sometimes makes it difficult to write tests for GDB's test suite.  And
> even when working on real programs I occasionally insert dead code as
> a place where I can set breakpoints.  So, personally, I'd prefer that
> -O0 be pretty stupid.  (Though I don't mind if it's not the default.)

That's a good point.  We seem to be trending in the direction of
having -O0 do a little bit of optimization, not much, is the main
reason I threw that in.

I think I meant unreachable code, not dead code; does that change your
opinion?

> One current side effect of optimization is that it enables lots of
> warnings (e.g. unitialized variable detection); if we're going to list
> explicit goals for different optimization levels, I would have that be
> a goal for -O1 (and of course for higher optimization levels).

One of the reasons I suggested -O1 be default is so that users would
get the warnings that require flow information with just -Wall.  I've
never liked that the set of warnings issued depends on the
optimization level.

>> I do not think it is appropriate to exclude optimizations from any
>> level just because they mess up debugging info
>
> I disagree with this for -O0.

Well, -O0 shouldn't be doing any of those, just because it isn't
supposed to be doing optimizations in general, but could you explain
your opinion a little more?  Or is this just reiterating what you said
above?

zw

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02  9:06             ` Zack Weinberg
  2003-12-02 14:09               ` Scott Robert Ladd
  2003-12-02 14:24               ` Jan Hubicka
@ 2003-12-02 17:09               ` David Carlton
  2003-12-02 17:23                 ` Zack Weinberg
                                   ` (2 more replies)
  2 siblings, 3 replies; 66+ messages in thread
From: David Carlton @ 2003-12-02 17:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc mailing list

On Tue, 02 Dec 2003 00:26:07 -0800, "Zack Weinberg"
<zack@codesourcery.com> said:

> -O0    No optimization whatsoever.  
>        Except maybe do obviously-dead code elimination.

This is obviously a very special case, but dead code elimination
sometimes makes it difficult to write tests for GDB's test suite.  And
even when working on real programs I occasionally insert dead code as
a place where I can set breakpoints.  So, personally, I'd prefer that
-O0 be pretty stupid.  (Though I don't mind if it's not the default.)

> -O1    Optimize, but speed of compilation is more important than
>        speed or size of generated code.  Possibly this, not -O0,
>        should be the default mode.

One current side effect of optimization is that it enables lots of
warnings (e.g. unitialized variable detection); if we're going to list
explicit goals for different optimization levels, I would have that be
a goal for -O1 (and of course for higher optimization levels).

> I do not think it is appropriate to exclude optimizations from any
> level just because they mess up debugging info

I disagree with this for -O0.

David Carlton
carlton@kealia.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 14:24               ` Jan Hubicka
@ 2003-12-02 16:01                 ` Felix Lee
  2003-12-02 22:25                   ` Scott Robert Ladd
  0 siblings, 1 reply; 66+ messages in thread
From: Felix Lee @ 2003-12-02 16:01 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Zack Weinberg, Scott Robert Ladd, Jan Hubicka, gcc mailing list

Jan Hubicka <hubicka@ucw.cz>:
> The idea of -Odebug is more foggy, so perhaps we can discuss it in
> separate.  I will happilly make -Ospeed/-Ofast, -Osize, -Obalanced
> patch if we have consensus here :)

howabout -O[goal], as in
  -Onone (-O0)
  -Oquickly (-O1)
  -Ofast (-O3)
  -Osmall (-Os)
  -Obalanced (-O2)

and add -Oconfusing to turn on optimizations that will make
debugging/understanding harder.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02  9:06             ` Zack Weinberg
  2003-12-02 14:09               ` Scott Robert Ladd
@ 2003-12-02 14:24               ` Jan Hubicka
  2003-12-02 16:01                 ` Felix Lee
  2003-12-02 17:09               ` David Carlton
  2 siblings, 1 reply; 66+ messages in thread
From: Jan Hubicka @ 2003-12-02 14:24 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Scott Robert Ladd, Jan Hubicka, Jan Hubicka, gcc mailing list

> Scott Robert Ladd <coyote@coyotegulch.com> writes:
> 
> > In my mind, GCC can support more than one constituency; I would very
> > much like to see a -Ospeed, -Osize, and -Obalanced switches, for
> > example, to provide specific optimization sets for given audiences.
> 
> I think this is a good idea.
> 
> I'd like to point out that there are Makefiles all over the place that
> are hardwired to -O2.  So -O2 needs to do something sane in the general
> case.  I would suggest that -O2 and your -Obalanced be synonymous.
> 
> I don't think it is a good idea to add more -O[0-9] levels.
> 
> There is another axis to consider, namely compile time, which is what
> -O1/2/3 are *supposed* to trade off now.  This becomes less and less
> important as hardware gets faster, but should not be ignored, since
> gcc is well known to be too slow even on fast hardware.
> 
> My suggested constellation of -O switches:
> 
> -O0    No optimization whatsoever.  
>        Except maybe do obviously-dead code elimination.
> -O1    Optimize, but speed of compilation is more important than
>        speed or size of generated code.  Possibly this, not -O0,
>        should be the default mode.
> 
> -O3/-Ospeed
>        Optimize for speed at the expense of size.
> -Os/-Osize
>        Optimize for size at the expense of speed.
> -O2/-Obalanced
>        Produce a balance of speed and size optimizations acceptable
>        for most code.

I would completely agree with this.  Perhaps we can use -Ofast,
so people won't think about existing -Os as shortcut for -Ospeed.
It is not very english, but perhaps more error prone.
Also -O3 does not include loop unrolling and -fomit-frame-pointer for
some reason.  I think we ought to consider including it.
> 
> Two factors that are *not* considered in any of these switches are
> ease of debugging, and scope (function/unit/program) of optimization.

I was thinking about this too.  The solution may be to introduce
-Odebug/-Ono-debug specifiers that control subset of optimizations
making debugging nasty.
However it is dificult to make decision here. Clearly -fweb is nasty until we have location lists.
Bot is for instance tail call optimization is nasty or not nasty?  It
breaks unwind info.  Similarly for register allocation and one don't
know where to stop.
The idea of -Odebug is more foggy, so perhaps we can discuss it in
separate.  I will happilly make -Ospeed/-Ofast, -Osize, -Obalanced
patch if we have consensus here :)

Honza
> I do not think it is appropriate to exclude optimizations from any
> level just because they mess up debugging info, and scope of
> optimization is a detail that shouldn't be exposed at the level of
> these switches.  If it makes sense in terms of the speed/size/compile
> speed tradeoffs to do whole-program optimization at -O1 then we should
> do it at -O1.  We can have -f switches for that.
> 
> Incidentally, Scott, I would encourage you to submit patches for 3.4
> that adjust the default optimization sets, if you can find
> better-in-general settings.
> 
> zw

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 13:01 Gareth McCaughan
  2003-12-02 14:15 ` Felix Lee
@ 2003-12-02 14:19 ` Scott Robert Ladd
  2003-12-03 11:55 ` Gerald Pfeifer
  2 siblings, 0 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-02 14:19 UTC (permalink / raw)
  To: Gareth McCaughan; +Cc: gcc

Gareth McCaughan wrote:
> There are several different axes along which the kind
> of code a compiler generates can vary:
> 
>   - execution speed
>   - code size
>   - debuggability
>   - compilation speed[1]
>   - robustness[2]
>   - simplicity of generated code[3]

Add

     - accuracy

To your list. My current Acovea tests are investigating the effects of 
various options on floating-point accuracy.

> and so on. It seems to me that rather than having a single
> "how much optimization?" parameter and a bunch of flags
> requesting special emphasis on execution speed or code size
> or whatever, it makes more sense to let users specify how
> important each axis is to them.
> 
> So, for instance, you could say
> 
>     gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0

A good ideas, but likely too complex for the average user. At the very 
least, we'd need continued support for the general -O0/1/2/3/s options, 
especially given how many Makefiles rely on them.

> Prior art: Common Lisp does almost exactly this, and it
> seems to work very well.

Good point. And I do think you have a good idea, if it can be 
implemented in a sane fashion such that people understand how their 
choices affect compilation.

In my experience, most programmers blindly slap -O2 or -O3 in their 
Makefiles, never rethinking their choice. You can lead a programmer to 
optimization, but you can't make them think.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02 13:01 Gareth McCaughan
@ 2003-12-02 14:15 ` Felix Lee
  2003-12-02 14:19 ` Scott Robert Ladd
  2003-12-03 11:55 ` Gerald Pfeifer
  2 siblings, 0 replies; 66+ messages in thread
From: Felix Lee @ 2003-12-02 14:15 UTC (permalink / raw)
  To: Gareth McCaughan; +Cc: gcc

Gareth McCaughan <gmccaughan@synaptics-uk.com>:
>     gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0

I think anyone who cares about such fine distinctions is going to
end up playing around with -f/-m flags anyway.  I don't see much
benefit to having the compiler make promises like that.  every
combination will have to be checked to make sure it acts
sensibly, for all targets.  as a user, I'd rather have just a few
canned options that are pretty well-tuned, and good documentation
of the tradeoffs (speed/space/debug/compiletime) for all the
various -f/-m flags.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-12-02  9:06             ` Zack Weinberg
@ 2003-12-02 14:09               ` Scott Robert Ladd
  2003-12-02 14:24               ` Jan Hubicka
  2003-12-02 17:09               ` David Carlton
  2 siblings, 0 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-12-02 14:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list

Zack Weinberg wrote:
> Incidentally, Scott, I would encourage you to submit patches for 3.4
> that adjust the default optimization sets, if you can find
> better-in-general settings.

The problem is finding a single set of optimizations that works well for 
all types of code. A flag that optimizes one section of a program may 
pessimize another. In many of my tests, a few options vastly improve the 
performance of a given algorithm, while one or two other options 
pessimize the same code.

That said, I have tentatively identified some general improvements. I 
need to carefully look at code size and compile times before doing 
something like adding -ftracer to -O3. Then I'll likely submit a patch 
or two, even if I don't have a contract.

;)

[sarcasm: on]
Of course, now that I understand how GCC development works, I'm more 
than happy to accept any dollars (euros, yen, etc.) sent my way. I 
wouldn't want to do anything as a public service, and undermine the 
financial incentives enjoyed by so many.
[sarcasm: off]

..Scott "Please send money" Ladd

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
@ 2003-12-02 13:01 Gareth McCaughan
  2003-12-02 14:15 ` Felix Lee
                   ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Gareth McCaughan @ 2003-12-02 13:01 UTC (permalink / raw)
  To: gcc

Scott Robert Ladd wrote:

> In my mind, GCC can support more than one constituency; I would very
> much like to see a -Ospeed, -Osize, and -Obalanced switches, for
> example, to provide specific optimization sets for given audiences.

I'd like to propose a slight variation on this theme.

There are several different axes along which the kind
of code a compiler generates can vary:

  - execution speed
  - code size
  - debuggability
  - compilation speed[1]
  - robustness[2]
  - simplicity of generated code[3]

and so on. It seems to me that rather than having a single
"how much optimization?" parameter and a bunch of flags
requesting special emphasis on execution speed or code size
or whatever, it makes more sense to let users specify how
important each axis is to them.

So, for instance, you could say

    gcc ..... -Ospeed=3 -Ospace=1 -Odebug=0 -Ocompiletime=0

to mean "make this run absolutely as fast as possible without
concern for debuggability or compilation time, but do take
a little trouble to avoid the generated code being enormous".
Most of the time, most users would just go on using -O2 or
whatever; the -O values would map to values for all the quality
axes. It might be worth having other pre-packaged sets of
optimization levels, though my feeling is that once you
get beyond the level of sophistication of putting "-O2" or
whatever on all your compilations, being exposed to the full
generality is unlikely to hurt.

Prior art: Common Lisp does almost exactly this, and it
seems to work very well.

[1] Compilation speed isn't really a property of the generated
    code, of course, but of the compilation process.

[2] At one extreme is making unsafe assumptions about the
    code, as many C compilers used to do when asked for
    optimization in the bad old days. At the other extreme
    is inserting bounds checks and the like. In between we
    have, e.g., -fstrict-aliasing. Possibly -ffast-math
    should be considered under this heading too.

[3] Closely related to debuggability, but maybe focusing on
    comprehensibility to humans.

-- 
Gareth McCaughan



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 23:16           ` Scott Robert Ladd
  2003-11-24  0:09             ` Jamie Lokier
@ 2003-12-02  9:06             ` Zack Weinberg
  2003-12-02 14:09               ` Scott Robert Ladd
                                 ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: Zack Weinberg @ 2003-12-02  9:06 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list

Scott Robert Ladd <coyote@coyotegulch.com> writes:

> In my mind, GCC can support more than one constituency; I would very
> much like to see a -Ospeed, -Osize, and -Obalanced switches, for
> example, to provide specific optimization sets for given audiences.

I think this is a good idea.

I'd like to point out that there are Makefiles all over the place that
are hardwired to -O2.  So -O2 needs to do something sane in the general
case.  I would suggest that -O2 and your -Obalanced be synonymous.

I don't think it is a good idea to add more -O[0-9] levels.

There is another axis to consider, namely compile time, which is what
-O1/2/3 are *supposed* to trade off now.  This becomes less and less
important as hardware gets faster, but should not be ignored, since
gcc is well known to be too slow even on fast hardware.

My suggested constellation of -O switches:

-O0    No optimization whatsoever.  
       Except maybe do obviously-dead code elimination.
-O1    Optimize, but speed of compilation is more important than
       speed or size of generated code.  Possibly this, not -O0,
       should be the default mode.

-O3/-Ospeed
       Optimize for speed at the expense of size.
-Os/-Osize
       Optimize for size at the expense of speed.
-O2/-Obalanced
       Produce a balance of speed and size optimizations acceptable
       for most code.

Two factors that are *not* considered in any of these switches are
ease of debugging, and scope (function/unit/program) of optimization.
I do not think it is appropriate to exclude optimizations from any
level just because they mess up debugging info, and scope of
optimization is a detail that shouldn't be exposed at the level of
these switches.  If it makes sense in terms of the speed/size/compile
speed tradeoffs to do whole-program optimization at -O1 then we should
do it at -O1.  We can have -f switches for that.

Incidentally, Scott, I would encourage you to submit patches for 3.4
that adjust the default optimization sets, if you can find
better-in-general settings.

zw

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-24  5:15               ` Scott Robert Ladd
@ 2003-11-24 20:54                 ` tm_gccmail
  0 siblings, 0 replies; 66+ messages in thread
From: tm_gccmail @ 2003-11-24 20:54 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Jamie Lokier, gcc mailing list

On Sun, 23 Nov 2003, Scott Robert Ladd wrote:

> Jamie Lokier wrote:
> > Notably, some people say that kernels optimised for size (-Os) run 
> > _faster_ than kernels optimised for "speed" (-O2).
> 
> In my experience, -Os is generally faster than -O1, although not as fast
> as -O2. I have seen cases where -O2 is substantially *slower* than -O0
> (no optimization.)

Can you send me a testcase so I can examine it?

Toshi


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-24  0:09             ` Jamie Lokier
@ 2003-11-24  5:15               ` Scott Robert Ladd
  2003-11-24 20:54                 ` tm_gccmail
  0 siblings, 1 reply; 66+ messages in thread
From: Scott Robert Ladd @ 2003-11-24  5:15 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: gcc mailing list

Jamie Lokier wrote:
> Notably, some people say that kernels optimised for size (-Os) run 
> _faster_ than kernels optimised for "speed" (-O2).

In my experience, -Os is generally faster than -O1, although not as fast
as -O2. I have seen cases where -O2 is substantially *slower* than -O0
(no optimization.)

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 23:16           ` Scott Robert Ladd
@ 2003-11-24  0:09             ` Jamie Lokier
  2003-11-24  5:15               ` Scott Robert Ladd
  2003-12-02  9:06             ` Zack Weinberg
  1 sibling, 1 reply; 66+ messages in thread
From: Jamie Lokier @ 2003-11-24  0:09 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Jan Hubicka, gcc mailing list

Scott Robert Ladd wrote:
> I'm certainly aware of the debates on the Linux Kernel list, in regard 
> to the size of code produced by GCC (and the compiler's compilation speed).

Notably, some people say that kernels optimised for size (-Os) run
_faster_ than kernels optimised for "speed" (-O2).

-- Jamie

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 20:11         ` Jan Hubicka
@ 2003-11-23 23:16           ` Scott Robert Ladd
  2003-11-24  0:09             ` Jamie Lokier
  2003-12-02  9:06             ` Zack Weinberg
  0 siblings, 2 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-11-23 23:16 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc mailing list

Jan Hubicka wrote:
> This is really touchy issue.  Many programmers consider size very
> important, others unimportant.

I take the broader view that *both* size and speed are important, in 
their appropriate contexts (and sometimes in combination).

I'm certainly aware of the debates on the Linux Kernel list, in regard 
to the size of code produced by GCC (and the compiler's compilation speed).

In my mind, GCC can support more than one constituency; I would very 
much like to see a -Ospeed, -Osize, and -Obalanced switches, for 
example, to provide specific optimization sets for given audiences.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing
In development: Alex, a database for common folk

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 18:50   ` Scott Robert Ladd
  2003-11-23 19:30     ` Jan Hubicka
@ 2003-11-23 20:57     ` Jan Hubicka
  1 sibling, 0 replies; 66+ messages in thread
From: Jan Hubicka @ 2003-11-23 20:57 UTC (permalink / raw)
  To: Scott Robert Ladd, gcc-patches; +Cc: Jan Hubicka, gcc mailing list

> Jan Hubicka wrote:
> >crossjumping has been -O1 thing forever, but I would also agree that
> >it should be disabled at -O1.  At minimum it may get compile time
> >expensive in some cases.
> 
> If "-fcrossjumping" has been part of -O1 "forever", as you say, why
> wasn't it mentioned in the man pages prior to GCC 3.3? This is one
> reason I missed testing it in earlier incarnations of gccacovea.
> 
> Version 3.1 of Acovea (to be posted mid-week) now tests some 65 options 
> (as opposed to 55 in Avocea 3.0.0).
> 
> >Do you have testcases that ere pessimized at -O2?  -fcrossjumping may
> > introduce new branches that are supposed to be elliminated by basic 
> >block reordering not done at -O1.
> 
> Pentium 4 results for the huffbench.c test:
> 
> 37.8s   -O1
> 34.0s	-O1 -fno-crossjumping
> 31.7s	-O2
> 30.3s	-O2 -fno-crossjumping
> 30.7s	-O3
> 28.0s 	-O3 -fno-crossjumping
> 37.6s	-Os
> 35.4s   -Os -fno-crossjumping
Hi,
the attached patch disables crossjumping at -O1, it still is enabled for
-O2 as I do believe it has good code size/performance ratios (last SPEC
runs I did -fcrossjumping were neutral perofmrance wise), but perhaps
the situation has changed.  But in that case it should be analyzed and
solved.

bootstrapped/regtested i386.

2003-11-23  Jan Hubicka  <jh@suse.cz>
	* opts.c (decode_options):  Disable crossjumping at -O1
	* invoke.texi (-O1): Document change.

Index: opts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/opts.c,v
retrieving revision 1.46
diff -c -3 -p -r1.46 opts.c
*** opts.c	21 Nov 2003 04:05:05 -0000	1.46
--- opts.c	23 Nov 2003 20:00:48 -0000
*************** decode_options (unsigned int argc, const
*** 529,541 ****
        flag_guess_branch_prob = 1;
        flag_cprop_registers = 1;
        flag_loop_optimize = 1;
-       flag_crossjumping = 1;
        flag_if_conversion = 1;
        flag_if_conversion2 = 1;
      }
  
    if (optimize >= 2)
      {
        flag_optimize_sibling_calls = 1;
        flag_cse_follow_jumps = 1;
        flag_cse_skip_blocks = 1;
--- 529,541 ----
        flag_guess_branch_prob = 1;
        flag_cprop_registers = 1;
        flag_loop_optimize = 1;
        flag_if_conversion = 1;
        flag_if_conversion2 = 1;
      }
  
    if (optimize >= 2)
      {
+       flag_crossjumping = 1;
        flag_optimize_sibling_calls = 1;
        flag_cse_follow_jumps = 1;
        flag_cse_skip_blocks = 1;
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.364
diff -c -3 -p -r1.364 invoke.texi
*** doc/invoke.texi	21 Nov 2003 11:42:58 -0000	1.364
--- doc/invoke.texi	23 Nov 2003 20:00:52 -0000
*************** compilation time.
*** 3651,3657 ****
  -fmerge-constants @gol
  -fthread-jumps @gol
  -floop-optimize @gol
- -fcrossjumping @gol
  -fif-conversion @gol
  -fif-conversion2 @gol
  -fdelayed-branch @gol
--- 3651,3656 ----
*************** also turns on the following optimization
*** 3688,3694 ****
  -fstrict-aliasing @gol
  -funit-at-a-time @gol
  -falign-functions  -falign-jumps @gol
! -falign-loops  -falign-labels}
  
  Please note the warning under @option{-fgcse} about
  invoking @option{-O2} on programs that use computed gotos.
--- 3687,3694 ----
  -fstrict-aliasing @gol
  -funit-at-a-time @gol
  -falign-functions  -falign-jumps @gol
! -falign-loops  -falign-labels @gol
! -fcrossjumping}
  
  Please note the warning under @option{-fgcse} about
  invoking @option{-O2} on programs that use computed gotos.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 20:06       ` Scott Robert Ladd
@ 2003-11-23 20:11         ` Jan Hubicka
  2003-11-23 23:16           ` Scott Robert Ladd
  0 siblings, 1 reply; 66+ messages in thread
From: Jan Hubicka @ 2003-11-23 20:11 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list

> >In fact I think we should have -Ofast/-Osize pair for users who want
> >such a specialzed one way tunning.
> 
> A -Ofast1 and -Ofast2 could define different sets of switches known to 
> produce fast code. Different pieces of code require different 
> optimizations; what may optimize one algorithm may pessimize another.
> 
> I'm working on a real-time video codec for a customer; it is *very* 
> significant that I can improve the program's speed, with evolved 
> options, by 25% or more over GCC's default optimization options.
> 
> Wanting the fastest possible generated code is *not* a "specialized way 
> of tuning." I find the "speed doesn't matter" attitude rather 
> disturbing; it is part-and-parcel to the code-bloat now considered 
> "acceptable" by a wide segment of the programming community. The best 

This is really touchy issue.  Many programmers consider size very
important, others unimportant.
I do agree that we need some set of options for people doing code like
codecs or similar engines (I made such code myself too).
I think -O3 in current scheme is closest to this definition. For
programs not having problems to fit in L1 chache it should more or less
consistently improve code speed.
It would be interesting what optimizations enabled by -O3 are usually a loss for
you.

Honza
> program is defined by the combination of efficient algorithms *and* 
> effective code generation.
> 
> ..Scott
> 
> -- 
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
> In development: Alex, a database for common folk

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 19:30     ` Jan Hubicka
@ 2003-11-23 20:06       ` Scott Robert Ladd
  2003-11-23 20:11         ` Jan Hubicka
  0 siblings, 1 reply; 66+ messages in thread
From: Scott Robert Ladd @ 2003-11-23 20:06 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jan Hubicka, gcc mailing list

Jan Hubicka wrote:
> Please do.  The control flow changes do have very random effect on
> modern x86 CPUs because of the complexity involved in the instruction
> decoding stages.
 >
> Can you please also try profile feedback?  It is well possible the
> compiler is misspredicting something resulting in randomly suboptimzal
> code layout.

I'll do so in the next day or so; right now, I'm running baselines for 
the other benchmarks.

> No, -O2 is supposed to be generally usefull option so it can not produce
> unnecesairly big binaries.  When running common integer code (startup
> scripts, KDE, OpenOffice, Gnome, whatever normal users are supposed to
> run today), code size is very critical. Both code size and speed should
> be in balance.

So perhaps we need a -Oint and -Ofloat (or better, -Oengine) options? I 
do not think we should pessimize engine code -- databases, codecs, 
servers -- by favoring only the needs of interactive and script code (or 
vice versa!)

The vast majority of developers lack the time and knowledge required to 
define specific sets of optimizations for their programs; they will use 
-O2 under the (I think valid) assumption that it produces faster code 
than does -O1. And the existence of -Os (which explicitly optimizes for 
"size") implies that -O1/2/3 optimize for speed.

> In fact I think we should have -Ofast/-Osize pair for users who want
> such a specialzed one way tunning.

A -Ofast1 and -Ofast2 could define different sets of switches known to 
produce fast code. Different pieces of code require different 
optimizations; what may optimize one algorithm may pessimize another.

I'm working on a real-time video codec for a customer; it is *very* 
significant that I can improve the program's speed, with evolved 
options, by 25% or more over GCC's default optimization options.

Wanting the fastest possible generated code is *not* a "specialized way 
of tuning." I find the "speed doesn't matter" attitude rather 
disturbing; it is part-and-parcel to the code-bloat now considered 
"acceptable" by a wide segment of the programming community. The best 
program is defined by the combination of efficient algorithms *and* 
effective code generation.

..Scott

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing
In development: Alex, a database for common folk

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 18:50   ` Scott Robert Ladd
@ 2003-11-23 19:30     ` Jan Hubicka
  2003-11-23 20:06       ` Scott Robert Ladd
  2003-11-23 20:57     ` Jan Hubicka
  1 sibling, 1 reply; 66+ messages in thread
From: Jan Hubicka @ 2003-11-23 19:30 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Jan Hubicka, gcc mailing list

> Jan Hubicka wrote:
> >crossjumping has been -O1 thing forever, but I would also agree that
> >it should be disabled at -O1.  At minimum it may get compile time
> >expensive in some cases.
> 
> If "-fcrossjumping" has been part of -O1 "forever", as you say, why
> wasn't it mentioned in the man pages prior to GCC 3.3? This is one
> reason I missed testing it in earlier incarnations of gccacovea.
Because there was no way to disable it.
I've added -fcrossjumping because of the compilation time issues we was
running into.  The issues are solved now.
> 
> Version 3.1 of Acovea (to be posted mid-week) now tests some 65 options 
> (as opposed to 55 in Avocea 3.0.0).
> 
> >Do you have testcases that ere pessimized at -O2?  -fcrossjumping may
> > introduce new branches that are supposed to be elliminated by basic 
> >block reordering not done at -O1.
> 
> Pentium 4 results for the huffbench.c test:
> 
> 37.8s   -O1
> 34.0s	-O1 -fno-crossjumping
> 31.7s	-O2
> 30.3s	-O2 -fno-crossjumping
> 30.7s	-O3
> 28.0s 	-O3 -fno-crossjumping
> 37.6s	-Os
> 35.4s   -Os -fno-crossjumping
> 
> The above *strongly* suggest that -fcrossjumping is pessimistic, at
> least in the case of huffbench. I dislike basing a broad assumption on a
> single test instance; I have yet to run complete tests on the other four
> benchmarks in my suite, and when I do, I'll be able to make a broader
Please do.  The control flow changes do have very random effect on
modern x86 CPUs because of the complexity involved in the instruction
decoding stages.
Can you please also try profile feedback?  It is well possible the
compiler is misspredicting something resulting in randomly suboptimzal
code layout.
> statement.
> 
> As for my Acovea-evolved set of options:
> 
> 22.3s   -O1 -fno-crossjumping -fexpensive-optimizations \
>             -fregmove -freorder-blocks -frename-registers \
>             -fnew-ra -funroll-all-loops -fomit-frame-pointer
> 
> 
> >Crossjumping is not supposed to get code faster, it is code size 
> >optimization, so the -O3 difference is likely showing that your code
> >is getting off the caches. Said that, it seems to me that it is good 
> >optimization for -O2 becuase binary size is very important factor.
> 
> According to the docs, -fcrossjumping is enabled with both -O1 and -Os.
> It seems to me that -fcrossjumping should *only* be implied by -Os, not
> -O1, given that -Os implies optimization for size, and -O1/2/3 imply
> optimization for speed.
No, -O2 is supposed to be generally usefull option so it can not produce
unnecesairly big binaries.  When running common integer code (startup
scripts, KDE, OpenOffice, Gnome, whatever normal users are supposed to
run today), code size is very critical. Both code size and speed should
be in balance.
In fact I think we should have -Ofast/-Osize pair for users who want
such a specialzed one way tunning.

Honza
> 
> -- 
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 18:19 ` Jan Hubicka
@ 2003-11-23 18:50   ` Scott Robert Ladd
  2003-11-23 19:30     ` Jan Hubicka
  2003-11-23 20:57     ` Jan Hubicka
  0 siblings, 2 replies; 66+ messages in thread
From: Scott Robert Ladd @ 2003-11-23 18:50 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc mailing list

Jan Hubicka wrote:
> crossjumping has been -O1 thing forever, but I would also agree that
> it should be disabled at -O1.  At minimum it may get compile time
> expensive in some cases.

If "-fcrossjumping" has been part of -O1 "forever", as you say, why
wasn't it mentioned in the man pages prior to GCC 3.3? This is one
reason I missed testing it in earlier incarnations of gccacovea.

Version 3.1 of Acovea (to be posted mid-week) now tests some 65 options 
(as opposed to 55 in Avocea 3.0.0).

> Do you have testcases that ere pessimized at -O2?  -fcrossjumping may
>  introduce new branches that are supposed to be elliminated by basic 
> block reordering not done at -O1.

Pentium 4 results for the huffbench.c test:

37.8s   -O1
34.0s	-O1 -fno-crossjumping
31.7s	-O2
30.3s	-O2 -fno-crossjumping
30.7s	-O3
28.0s 	-O3 -fno-crossjumping
37.6s	-Os
35.4s   -Os -fno-crossjumping

The above *strongly* suggest that -fcrossjumping is pessimistic, at
least in the case of huffbench. I dislike basing a broad assumption on a
single test instance; I have yet to run complete tests on the other four
benchmarks in my suite, and when I do, I'll be able to make a broader
statement.

As for my Acovea-evolved set of options:

22.3s   -O1 -fno-crossjumping -fexpensive-optimizations \
             -fregmove -freorder-blocks -frename-registers \
             -fnew-ra -funroll-all-loops -fomit-frame-pointer

I'm running gcc 3.4 20031119.

The performance of my evolved option set was rather impressive -- 27% 
*faster* than -O3 alone.

My first guess was that the "-fnew-ra -funroll-all-loops 
-fomit-frame-pointer" options (not included in -O3) were primarily 
responsible for improved performance. So I tried these two tests:

32.4s   -O3 -fnew-ra -funroll-all-loops -fomit-frame-pointer

32.6s   -O3 -fno-crossjumping \
             -fnew-ra -funroll-all-loops -fomit-frame-pointer

My guess was onbviously *wrong.* Sooo.... to me, the above results 
suggest a pessimistic interaction among options implied by -O1/2/3.

> Crossjumping is not supposed to get code faster, it is code size 
> optimization, so the -O3 difference is likely showing that your code
> is getting off the caches. Said that, it seems to me that it is good 
> optimization for -O2 becuase binary size is very important factor.

According to the docs, -fcrossjumping is enabled with both -O1 and -Os.
It seems to me that -fcrossjumping should *only* be implied by -Os, not
-O1, given that -Os implies optimization for size, and -O1/2/3 imply
optimization for speed.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Should -fcross-jumping be part of -O1?
  2003-11-23 18:09 Scott Robert Ladd
@ 2003-11-23 18:19 ` Jan Hubicka
  2003-11-23 18:50   ` Scott Robert Ladd
  0 siblings, 1 reply; 66+ messages in thread
From: Jan Hubicka @ 2003-11-23 18:19 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: gcc mailing list

> I've been testing a somewhat more comprehensive version of my 
> evolutionary algorithm, based on feedback from readers of my article and 
> an extended set of tested options.
> 
> In the first tests run yesterday, -fcrossjumping proved to be 
> pessimistic for some cases. On one benchmark (huffbench), "-O3 
> -fno-crossjumping" is 7% faster than "-O3", all other flags being equal. 
>  The man states that, when using -fcrossjumping, "the resulting code 
> may or may not perform better than without cross-jumping."
> 
> So why is -fcrossjumping included in -O1?
crossjumping has been -O1 thing forever, but I would also agree that it
should be disabled at -O1.  At minimum it may get compile time expensive
in some cases.

Do you have testcases that ere pessimized at -O2?  -fcrossjumping may
introduce new branches that are supposed to be elliminated by basic
block reordering not done at -O1.

Crossjumping is not supposed to get code faster, it is code size
optimization, so the -O3 difference is likely showing that your code is
getting off the caches. Said that, it seems to me that it is good
optimization for -O2 becuase binary size is very important factor.

Honza
> 
> -- 
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
> In development: Alex, a database for common folk

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Should -fcross-jumping be part of -O1?
@ 2003-11-23 18:09 Scott Robert Ladd
  2003-11-23 18:19 ` Jan Hubicka
  0 siblings, 1 reply; 66+ messages in thread
From: Scott Robert Ladd @ 2003-11-23 18:09 UTC (permalink / raw)
  To: gcc mailing list

I've been testing a somewhat more comprehensive version of my 
evolutionary algorithm, based on feedback from readers of my article and 
an extended set of tested options.

In the first tests run yesterday, -fcrossjumping proved to be 
pessimistic for some cases. On one benchmark (huffbench), "-O3 
-fno-crossjumping" is 7% faster than "-O3", all other flags being equal. 
  The man states that, when using -fcrossjumping, "the resulting code 
may or may not perform better than without cross-jumping."

So why is -fcrossjumping included in -O1?

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing
In development: Alex, a database for common folk

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2003-12-05 11:46 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-03 17:07 Should -fcross-jumping be part of -O1? Richard Kenner
2003-12-03 19:48 ` Felix Lee
  -- strict thread matches above, loose matches on Subject: below --
2003-12-05 11:46 Gareth McCaughan
2003-12-03 21:35 Stephan T. Lavavej
2003-12-03 17:26 Nathanael Nerode
2003-12-03 15:45 Richard Kenner
2003-12-03 16:03 ` Scott Robert Ladd
2003-12-03 16:19 ` Robert Dewar
2003-12-03 16:38 ` Felix Lee
2003-12-03 15:07 Richard Kenner
2003-12-03 15:12 ` Scott Robert Ladd
2003-12-03 15:27   ` Robert Dewar
2003-12-03 15:44   ` Ian Lance Taylor
2003-12-03 15:59     ` Scott Robert Ladd
2003-12-03 16:12     ` Robert Dewar
2003-12-03 16:26       ` Ian Lance Taylor
2003-12-03 16:42         ` Robert Dewar
2003-12-03 17:03           ` Ian Lance Taylor
2003-12-03 17:16             ` Robert Dewar
2003-12-03 16:32       ` Scott Robert Ladd
2003-12-03 16:53         ` Robert Dewar
2003-12-04  1:17           ` Gabriel Dos Reis
2003-12-04  1:27             ` Robert Dewar
2003-12-04  1:49               ` Gabriel Dos Reis
2003-12-04  7:33                 ` Robert Dewar
2003-12-04 13:45                   ` Gabriel Dos Reis
2003-12-04 13:51                     ` Scott Robert Ladd
2003-12-05  1:04                     ` Robert Dewar
2003-12-04 21:03               ` Toon Moene
2003-12-03 16:02   ` Paul Jarc
2003-12-03 16:46     ` Felix Lee
2003-12-04  0:35   ` Jan Hubicka
2003-12-04  1:27     ` Mike Stump
2003-12-04 18:40       ` Joe Buck
2003-12-02 13:01 Gareth McCaughan
2003-12-02 14:15 ` Felix Lee
2003-12-02 14:19 ` Scott Robert Ladd
2003-12-03 11:55 ` Gerald Pfeifer
2003-12-03 11:58   ` Scott A Crosby
2003-11-23 18:09 Scott Robert Ladd
2003-11-23 18:19 ` Jan Hubicka
2003-11-23 18:50   ` Scott Robert Ladd
2003-11-23 19:30     ` Jan Hubicka
2003-11-23 20:06       ` Scott Robert Ladd
2003-11-23 20:11         ` Jan Hubicka
2003-11-23 23:16           ` Scott Robert Ladd
2003-11-24  0:09             ` Jamie Lokier
2003-11-24  5:15               ` Scott Robert Ladd
2003-11-24 20:54                 ` tm_gccmail
2003-12-02  9:06             ` Zack Weinberg
2003-12-02 14:09               ` Scott Robert Ladd
2003-12-02 14:24               ` Jan Hubicka
2003-12-02 16:01                 ` Felix Lee
2003-12-02 22:25                   ` Scott Robert Ladd
2003-12-02 22:29                     ` Eric Christopher
2003-12-02 22:47                     ` Ian Lance Taylor
2003-12-02 17:09               ` David Carlton
2003-12-02 17:23                 ` Zack Weinberg
2003-12-02 17:31                   ` David Edelsohn
2003-12-02 17:44                   ` David Carlton
2003-12-02 19:01                 ` Robert Dewar
2003-12-02 21:00                   ` Joe Buck
2003-12-02 21:29                     ` Robert Dewar
2003-12-02 20:15                 ` Geoff Keating
2003-12-02 20:36                   ` David Carlton
2003-11-23 20:57     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).