* Dropping of old loop optimizer @ 2003-02-27 1:04 Zdenek Dvorak 2003-02-27 1:12 ` tm_gccmail 0 siblings, 1 reply; 33+ messages in thread From: Zdenek Dvorak @ 2003-02-27 1:04 UTC (permalink / raw) To: gcc; +Cc: m.hayes, rth, law, dan, jh Hello, I would like to know your opinion on this subject. Why I believe old loop optimizer should be dropped as soon as possible: 1) It destroys CFG, and thus profile feedback/estimation must be run after it. It would be cool to have it available at least in gcse, and perhaps other earlier passes too. This could probably be solved by just dropping the part that manipulates CFG (the unroller) and relatively simple modification of the rest. 2) It requires the loop notes. The need to preserve them limits and complicates some pre-loop passes (cfg_cleanups, copy_loop_headers, ...) and enables the loop optimizer to work only for a limited class of loops. This does not seem to be fixable without radical rewrite of the optimizer. What the loop optimizer consists of: -- unroller -- already replaced by new one. -- doloop optimalization -- can be trivially converted to use the new loop infrastructure -- loop invariant motion -- should be unnecessary with gcse; I am currently working on improving gcse so that it indeed is -- prefetching, strength reduction, induction variable elimination -- this is a problem. They are way too magic for me to be able to just modify them somehow; I would basically have to rewrite them from scratch (of course some parts could be used, but anyway it would be quite a lot of work) Any ideas? Zdenek ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 1:04 Dropping of old loop optimizer Zdenek Dvorak @ 2003-02-27 1:12 ` tm_gccmail 2003-02-27 2:27 ` Zdenek Dvorak 0 siblings, 1 reply; 33+ messages in thread From: tm_gccmail @ 2003-02-27 1:12 UTC (permalink / raw) To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh On Thu, 27 Feb 2003, Zdenek Dvorak wrote: > Hello, > > I would like to know your opinion on this subject. Why I believe old > loop optimizer should be dropped as soon as possible: > 1) It destroys CFG, and thus profile feedback/estimation must be run > after it. It would be cool to have it available at least in gcse, > and perhaps other earlier passes too. This could probably be solved > by just dropping the part that manipulates CFG (the unroller) and > relatively simple modification of the rest. > 2) It requires the loop notes. The need to preserve them limits and > complicates some pre-loop passes (cfg_cleanups, copy_loop_headers, > ...) and enables the loop optimizer to work only for a limited class > of loops. This does not seem to be fixable without radical rewrite > of the optimizer. > > What the loop optimizer consists of: > -- unroller -- already replaced by new one. > -- doloop optimalization -- can be trivially converted to use the new loop > infrastructure > -- loop invariant motion -- should be unnecessary with gcse; I am > currently working on improving gcse so that it indeed is > -- prefetching, strength reduction, induction variable elimination -- > this is a problem. They are way too magic for me to be able to > just modify them somehow; I would basically have to rewrite them from > scratch (of course some parts could be used, but anyway it would be > quite a lot of work) The "old" loop optimizer does BIV-to-GIV flattening which is important on processors without double-register addressing modes such as the SH, MIPS16, Thumb, and a few others. Does the new loop optimizer support this feature? > Any ideas? > > Zdenek Toshi ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 1:12 ` tm_gccmail @ 2003-02-27 2:27 ` Zdenek Dvorak 2003-02-27 4:07 ` tm_gccmail 2003-02-27 15:48 ` Pop Sébastian 0 siblings, 2 replies; 33+ messages in thread From: Zdenek Dvorak @ 2003-02-27 2:27 UTC (permalink / raw) To: tm_gccmail; +Cc: gcc, m.hayes, rth, law, dan, jh Hello, > > I would like to know your opinion on this subject. Why I believe old > > loop optimizer should be dropped as soon as possible: > > 1) It destroys CFG, and thus profile feedback/estimation must be run > > after it. It would be cool to have it available at least in gcse, > > and perhaps other earlier passes too. This could probably be solved > > by just dropping the part that manipulates CFG (the unroller) and > > relatively simple modification of the rest. > > 2) It requires the loop notes. The need to preserve them limits and > > complicates some pre-loop passes (cfg_cleanups, copy_loop_headers, > > ...) and enables the loop optimizer to work only for a limited class > > of loops. This does not seem to be fixable without radical rewrite > > of the optimizer. > > > > What the loop optimizer consists of: > > -- unroller -- already replaced by new one. > > -- doloop optimalization -- can be trivially converted to use the new loop > > infrastructure > > -- loop invariant motion -- should be unnecessary with gcse; I am > > currently working on improving gcse so that it indeed is > > -- prefetching, strength reduction, induction variable elimination -- > > this is a problem. They are way too magic for me to be able to > > just modify them somehow; I would basically have to rewrite them from > > scratch (of course some parts could be used, but anyway it would be > > quite a lot of work) > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > processors without double-register addressing modes such as the SH, > MIPS16, Thumb, and a few others. > > Does the new loop optimizer support this feature? no; as I said, anything related to iv's is problem. Zdenek ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 2:27 ` Zdenek Dvorak @ 2003-02-27 4:07 ` tm_gccmail 2003-02-27 4:24 ` tm_gccmail 2003-02-27 11:47 ` Zdenek Dvorak 2003-02-27 15:48 ` Pop Sébastian 1 sibling, 2 replies; 33+ messages in thread From: tm_gccmail @ 2003-02-27 4:07 UTC (permalink / raw) To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh On Thu, 27 Feb 2003, Zdenek Dvorak wrote: > Hello, > > > > I would like to know your opinion on this subject. Why I believe old > > > loop optimizer should be dropped as soon as possible: ... > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > > processors without double-register addressing modes such as the SH, > > MIPS16, Thumb, and a few others. > > > > Does the new loop optimizer support this feature? > > no; as I said, anything related to iv's is problem. Then I personally (and probably most users of the SH) would prefer the old loop optimizer be retained until the new loop optimizer supports BIV flattening. This one optimization is extremely important; if we don't have it, the compiler will generate a move/shift/add operation for each IV-related load/store within the loop, which kills performance. > Zdenek Toshi ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 4:07 ` tm_gccmail @ 2003-02-27 4:24 ` tm_gccmail 2003-02-27 11:47 ` Zdenek Dvorak 1 sibling, 0 replies; 33+ messages in thread From: tm_gccmail @ 2003-02-27 4:24 UTC (permalink / raw) To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh On Wed, 26 Feb 2003 tm_gccmail@mail.kloo.net wrote: > On Thu, 27 Feb 2003, Zdenek Dvorak wrote: > > > Hello, > > > > > > I would like to know your opinion on this subject. Why I believe old > > > > loop optimizer should be dropped as soon as possible: > ... > > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > > > processors without double-register addressing modes such as the SH, > > > MIPS16, Thumb, and a few others. > > > > > > Does the new loop optimizer support this feature? > > > > no; as I said, anything related to iv's is problem. > > Then I personally (and probably most users of the SH) would prefer the old > loop optimizer be retained until the new loop optimizer supports BIV > flattening. > > This one optimization is extremely important; if we don't have it, the > compiler will generate a move/shift/add operation for each IV-related > load/store within the loop, which kills performance. I forgot to mention that the old loop optimizer also knows not to host address inheritance out of the loop, which increases register pressure dramatically. If the loop optimizer can leave the address calculations inside the loop, then regmove can simplify it properly...this is very important. > > Zdenek > > Toshi Toshi ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 4:07 ` tm_gccmail 2003-02-27 4:24 ` tm_gccmail @ 2003-02-27 11:47 ` Zdenek Dvorak 1 sibling, 0 replies; 33+ messages in thread From: Zdenek Dvorak @ 2003-02-27 11:47 UTC (permalink / raw) To: tm_gccmail; +Cc: gcc, m.hayes, rth, law, dan, jh Hello, > > > > I would like to know your opinion on this subject. Why I believe old > > > > loop optimizer should be dropped as soon as possible: > ... > > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > > > processors without double-register addressing modes such as the SH, > > > MIPS16, Thumb, and a few others. > > > > > > Does the new loop optimizer support this feature? > > > > no; as I said, anything related to iv's is problem. > > Then I personally (and probably most users of the SH) would prefer the old > loop optimizer be retained until the new loop optimizer supports BIV > flattening. > > This one optimization is extremely important; if we don't have it, the > compiler will generate a move/shift/add operation for each IV-related > load/store within the loop, which kills performance. you perhaps don't understand what I wrote. I have never suggested dropping the loop optimizer now; I am asking for the best way how to replace its functionality, so that it can be dropped. Zdenek ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 2:27 ` Zdenek Dvorak 2003-02-27 4:07 ` tm_gccmail @ 2003-02-27 15:48 ` Pop Sébastian 2003-02-27 16:06 ` Daniel Berlin 2003-02-27 16:59 ` Zdenek Dvorak 1 sibling, 2 replies; 33+ messages in thread From: Pop Sébastian @ 2003-02-27 15:48 UTC (permalink / raw) To: Zdenek Dvorak; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh Hi, On Thu, Feb 27, 2003 at 02:04:14AM +0100, Zdenek Dvorak wrote: > > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > > processors without double-register addressing modes such as the SH, > > MIPS16, Thumb, and a few others. > > > > Does the new loop optimizer support this feature? > > no; as I said, anything related to iv's is problem. > Right now I'm working on a pass that enrich the SSA representation by the IV functions information. IVs are represented by chains of recurences as described in: http://citeseer.nj.nec.com/485503.html http://citeseer.nj.nec.com/vanengelen00symbolic.html In contrast to these papers I'll not transform the underlying representation, but enrich SSA with symbolic information that analyzers, as the dependence tester, or optimizers will use. As long as you have an SSA representation over RTL it should be straight to implement it from the tree-SSA. Right now I have a working multivariate and periodic CR system. I have extended the CR system to flip-flop operations that introduce periodicity effects. I still work on extending the CR system to undefined operations that nor Zima, nor Engelen defined (for example the multiplication of exponential CRs by polynomial CRs, or the addition of exponential CRs). I will perform these operations symbolically without expecting to have a code generator at the end as it is the case in the previous papers. I think that analyzers and optimizers should be implemented without mixes. Interfacing with the existing tree-SSA is the next step in my todo list. Then I'll look at how it could be adapted to RTL. Sebastian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 15:48 ` Pop Sébastian @ 2003-02-27 16:06 ` Daniel Berlin 2003-02-27 16:32 ` Pop Sébastian 2003-02-27 17:13 ` Pop Sébastian 2003-02-27 16:59 ` Zdenek Dvorak 1 sibling, 2 replies; 33+ messages in thread From: Daniel Berlin @ 2003-02-27 16:06 UTC (permalink / raw) To: Pop Sébastian Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thursday, February 27, 2003, at 10:18 AM, Pop Sébastian wrote: > Hi, > > On Thu, Feb 27, 2003 at 02:04:14AM +0100, Zdenek Dvorak wrote: >> >>> The "old" loop optimizer does BIV-to-GIV flattening which is >>> important on >>> processors without double-register addressing modes such as the SH, >>> MIPS16, Thumb, and a few others. >>> >>> Does the new loop optimizer support this feature? >> >> no; as I said, anything related to iv's is problem. >> > > Right now I'm working on a pass that enrich the SSA representation > by the IV functions information. IVs are represented by chains of > recurences as described in: I had started to work on this method, but decided it was too complex, and moved to monotonic evolution. Are you sure CR's are going to be fast enough? --Dan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 16:06 ` Daniel Berlin @ 2003-02-27 16:32 ` Pop Sébastian 2003-02-27 19:43 ` Daniel Berlin 2003-02-27 17:13 ` Pop Sébastian 1 sibling, 1 reply; 33+ messages in thread From: Pop Sébastian @ 2003-02-27 16:32 UTC (permalink / raw) To: Daniel Berlin; +Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote: > > I had started to work on this method, but decided it was too complex, > and moved to monotonic evolution. > In contrast to the CR method, monotonic evolution (ME) gives approximate data dependences: you'll have the direction but not the distance vector of a dependence. Fact: not all optimizers use approximate vectors. In this point you cannot compare these two passes, but anyway ... > Are you sure CR's are going to be fast enough? > ..., I agree CR is slower than ME. ME was designed to be faster. We could speed up CRs by generating them on demand. This way we'll limit this "expensive" ("not so-" in my opinion since I haven't tested it yet :-) pass to a single loop nest. Sebastian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 16:32 ` Pop Sébastian @ 2003-02-27 19:43 ` Daniel Berlin 2003-02-28 13:06 ` Pop Sébastian 0 siblings, 1 reply; 33+ messages in thread From: Daniel Berlin @ 2003-02-27 19:43 UTC (permalink / raw) To: Pop Sébastian Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thursday, February 27, 2003, at 11:14 AM, Pop Sébastian wrote: > On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote: >> >> I had started to work on this method, but decided it was too complex, >> and moved to monotonic evolution. >> > In contrast to the CR method, monotonic evolution (ME) gives > approximate > data dependences: you'll have the direction but not the distance vector > of a dependence. Fact: not all optimizers use approximate vectors. > Um, actually, all the papers i have on ME describe a distance extended lattice to solve this problem. I wouldn't have considered ME if it couldn't do distance information. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 19:43 ` Daniel Berlin @ 2003-02-28 13:06 ` Pop Sébastian 0 siblings, 0 replies; 33+ messages in thread From: Pop Sébastian @ 2003-02-28 13:06 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Thu, Feb 27, 2003 at 02:15:29PM -0500, Daniel Berlin wrote: > > Um, actually, all the papers i have on ME describe a distance extended > lattice to solve this problem. > I wouldn't have considered ME if it couldn't do distance information. > You're right. I just read the old version of their paper, in the new one they extend ME to distance intervals. However I'm not yet convinced why ME is faster than CR even when ME computes closed forms... Sebastian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 16:06 ` Daniel Berlin 2003-02-27 16:32 ` Pop Sébastian @ 2003-02-27 17:13 ` Pop Sébastian 1 sibling, 0 replies; 33+ messages in thread From: Pop Sébastian @ 2003-02-27 17:13 UTC (permalink / raw) To: Daniel Berlin; +Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thu, Feb 27, 2003 at 04:14:48PM +0000, pop wrote: > On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote: > > > > I had started to work on this method, but decided it was too complex, > > and moved to monotonic evolution. > > > In contrast to the CR method, monotonic evolution (ME) gives approximate > data dependences: you'll have the direction but not the distance vector > of a dependence. Fact: not all optimizers use approximate vectors. > > In this point you cannot compare these two passes, but anyway ... > Forgot to mention the conclusion of one of the papers on ME: "Monotonic Evolution: an Alternative to Induction Variable Substitution for Dependence Analysis" Peng Wu, Albert Cohen, Jay Hoeflinger, and David Padua "Closed form expression computation [...] is also critical for removing dependences due to computation of induction variables themselves." In other words if you generate code for IV functions at loop-phi-nodes, then you can parallelize these loops. Sebastian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 15:48 ` Pop Sébastian 2003-02-27 16:06 ` Daniel Berlin @ 2003-02-27 16:59 ` Zdenek Dvorak 2003-02-27 18:30 ` Pop Sébastian 1 sibling, 1 reply; 33+ messages in thread From: Zdenek Dvorak @ 2003-02-27 16:59 UTC (permalink / raw) To: Pop Sébastian; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh Hello, > > > The "old" loop optimizer does BIV-to-GIV flattening which is important on > > > processors without double-register addressing modes such as the SH, > > > MIPS16, Thumb, and a few others. > > > > > > Does the new loop optimizer support this feature? > > > > no; as I said, anything related to iv's is problem. > > > > Right now I'm working on a pass that enrich the SSA representation > by the IV functions information. IVs are represented by chains of > recurences as described in: > > http://citeseer.nj.nec.com/485503.html > http://citeseer.nj.nec.com/vanengelen00symbolic.html > > In contrast to these papers I'll not transform the underlying > representation, but enrich SSA with symbolic information that > analyzers, as the dependence tester, or optimizers will use. > > As long as you have an SSA representation over RTL it should be > straight to implement it from the tree-SSA. The problem I see is that rtl ssa is broken (or never worked?); but perhaps we could do this kind of optimizations just on trees (that unfortunately also aren't ready for serious usage yet, right?)? > Right now I have a working multivariate and periodic CR system. > I have extended the CR system to flip-flop operations that introduce > periodicity effects. > > I still work on extending the CR system to undefined operations > that nor Zima, nor Engelen defined (for example the multiplication > of exponential CRs by polynomial CRs, or the addition of exponential CRs). > I will perform these operations symbolically without expecting to > have a code generator at the end as it is the case in the previous papers. > I think that analyzers and optimizers should be implemented without mixes. > > Interfacing with the existing tree-SSA is the next step in my todo list. > Then I'll look at how it could be adapted to RTL. How long do you think it will take you before the result is usable (I wonder whether it makes sense to work on some temporary, not so clean but faster solution)? Zdenek ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 16:59 ` Zdenek Dvorak @ 2003-02-27 18:30 ` Pop Sébastian 2003-02-27 19:15 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher 2003-02-27 19:59 ` Dropping of old loop optimizer Daniel Berlin 0 siblings, 2 replies; 33+ messages in thread From: Pop Sébastian @ 2003-02-27 18:30 UTC (permalink / raw) To: Zdenek Dvorak; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: > > The problem I see is that rtl ssa is broken (or never worked?); but Last time I had a close look at rtl was about two years ago :-) Dan is the expert to ask about the status of SSA at rtl level (he adapted the CCP code from rtl to tree-ssa). > perhaps we could do this kind of optimizations just on trees (that > unfortunately also aren't ready for serious usage yet, right?)? > CCP (Conditional constant propagation) and DCE (Dead code elimination) seem to work fine on tree-SSA. > > Right now I have a working multivariate and periodic CR system. > > I have extended the CR system to flip-flop operations that introduce > > periodicity effects. > > > > I still work on extending the CR system to undefined operations > > that nor Zima, nor Engelen defined (for example the multiplication > > of exponential CRs by polynomial CRs, or the addition of exponential CRs). > > I will perform these operations symbolically without expecting to > > have a code generator at the end as it is the case in the previous papers. > > I think that analyzers and optimizers should be implemented without mixes. > > > > Interfacing with the existing tree-SSA is the next step in my todo list. > > Then I'll look at how it could be adapted to RTL. > > How long do you think it will take you before the result is usable (I Since this is an analyzer the work could be done faster than for an optimization pass since difficult cases could be delayed to a later refinement. In these difficult cases the analyzer will just reply "don't know". > wonder whether it makes sense to work on some temporary, not so clean > but faster solution)? > Seems possible with successive refinements. Sebastian ^ permalink raw reply [flat|nested] 33+ messages in thread
* tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 18:30 ` Pop Sébastian @ 2003-02-27 19:15 ` Steven Bosscher 2003-02-27 20:06 ` Diego Novillo 2003-02-27 19:59 ` Dropping of old loop optimizer Daniel Berlin 1 sibling, 1 reply; 33+ messages in thread From: Steven Bosscher @ 2003-02-27 19:15 UTC (permalink / raw) To: Pop Sébastian, dnovillo; +Cc: gcc Op do 27-02-2003, om 18:13 schreef Pop Sébastian: > On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: > > > > The problem I see is that rtl ssa is broken (or never worked?); but > Last time I had a close look at rtl was about two years ago :-) > Dan is the expert to ask about the status of SSA at rtl level (he adapted > the CCP code from rtl to tree-ssa). > > > perhaps we could do this kind of optimizations just on trees (that > > unfortunately also aren't ready for serious usage yet, right?)? > > > CCP (Conditional constant propagation) and > DCE (Dead code elimination) > seem to work fine on tree-SSA. There is a difference between things that "work fine" and things that "are ready". What, for example, is the compile time performance on the branch? SPEC results compared to mainline? If I correctly interpret Diego's SPEC tester results, it does not quite look like it's there yet. Diego, you suggested last month that you would like to merge the branch in the near feature (on the gomp list IIRC). What are the show stoppers you know about, other than that it will not be ready before stage1 ends? There branch has moved so fast over the last two months that a status update would be nice... (I can send you a patch to update the web page iyl. It still claims you're using FUD chains :-) Greetz Steven ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 19:15 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher @ 2003-02-27 20:06 ` Diego Novillo 2003-02-27 20:36 ` Steven Bosscher 2003-02-27 20:45 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin 0 siblings, 2 replies; 33+ messages in thread From: Diego Novillo @ 2003-02-27 20:06 UTC (permalink / raw) To: Steven Bosscher; +Cc: gcc On Thu, 27 Feb 2003, Steven Bosscher wrote: > There is a difference between things that "work fine" and things that > "are ready". > Yup. > What, for example, is the compile time performance on the branch? SPEC > results compared to mainline? If I correctly interpret Diego's SPEC > tester results, it does not quite look like it's there yet. > Yes, we are not quite there yet. Since you brought it up, I got curious and gathered a few stats. This is how the branch looks today in terms of compile time compared with mainline: - All times are in seconds. - All the compilers are built with ENABLE_CHECKING. - Bootstrap times are for all default languages (C, C++, Java, Fortran and ObjC). ------------------------------------------------------------------------ Build times mainline tree-ssa Difference ------------------------------------------------------------------------ SPEC95 375 418 +11% SPEC2000 517 586 +13% Bootstrap 3247 3588 +10% ------------------------------------------------------------------------ In terms of runtime performance, the branch is ~3% slower on SPECint2000 and ~11% slower on SPECfp2000. There's a couple of individual tests where the branch is marginally faster than mainline. The branch also fails to compile vpr, eon and perlbmk. So we need to look at those failures eventually. Boring details at the end of this message. For SPEC95, the branch is about 7% slower than mainline on SPECint95 (SPECfp95 is Fortran only, so the results are virtually identical). The branch fails to compile m88ksim and produces better results for compress. > Diego, you suggested last month that you would like to merge the branch > in the near feature (on the gomp list IIRC). What are the show stoppers > you know about, other than that it will not be ready before stage1 > ends? There branch has moved so fast over the last two months that a > My idea is to propose a merge only when we meet the following: - Bootstrap times similar or better than mainline. - Comparable SPEC95 and SPEC2000 results. - Number of testsuite failures comparable with mainline. - Tree optimizers enabled for C and C++. The branch may have to comply with additional requirements from the community. That's just my own of criteria, I may have missed something. The usual battery of cross and native builds will have to be done, as well. In terms of things left to do. The more important that come to mind are: - A real SSA->normal pass. The current unSSA pass simply drops the SSA version numbers. If two versions of the same variable have been moved crossing life boundaries, we silently generate wrong code. - Support for try/catch in the flowgraph. With the addition of C++, we now support try/catch in GIMPLE. The flowgraph still doesn't know that. - Investigate what can be modified/removed/simplified in the RTL backend due to the different "flavour" of RTL that the tree optimizers emit. I suspect that we could get a big compile time boost if we could get rid of unnecessary work in the backend. We could also find that we need some more additional passes at the tree level that we still haven't thought of. So, it's a lot of work. Will it be ready for 3.5's stage1? I don't know. Particularly if the list of requirements grows bigger. The integration work will also be interesting. I diff'd mainline and the branch a few days ago: 307 files changed, 80994 insertions(+), 4342 deletions(-) > (I can send you a patch to update the web page iyl. It still claims > you're using FUD chains :-) > Oops. Patches appreciated! Diego. --------------------------------------------------------------------------- SPECint2000 Benchmark Mainline tree-ssa % diff 164.gzip 669.33 646.19 - 3.46% 175.vpr 430.87 0.00 -100.00% 176.gcc 678.95 649.51 - 4.34% 181.mcf 418.57 425.79 + 1.73% 186.crafty 689.60 647.80 - 6.06% 197.parser 561.97 539.01 - 4.09% 252.eon 583.56 0.00 -100.00% 253.perlbmk 868.12 0.00 -100.00% 254.gap 736.19 711.74 - 3.32% 255.vortex 832.92 837.57 + 0.56% 256.bzip2 545.44 513.55 - 5.85% 300.twolf 538.90 520.49 - 3.42% mean 614.52 599.10 - 2.51% SPECfp2000 Benchmark Before After % diff 168.wupwise 824.20 808.10 - 1.95% 171.swim 506.86 502.08 - 0.94% 172.mgrid 485.86 515.41 + 6.08% 173.applu 555.19 536.63 - 3.34% 177.mesa 731.30 678.08 - 7.28% 178.galgel 0.00 0.00 INF 179.art 373.37 191.51 - 48.71% 183.equake 819.99 820.92 + 0.11% 187.facerec 0.00 0.00 INF 188.ammp 359.64 351.84 - 2.17% 189.lucas 0.00 0.00 INF 191.fma3d 0.00 0.00 INF 200.sixtrack 314.80 304.86 - 3.16% 301.apsi 0.00 340.96 INF mean 521.57 461.42 - 11.53% --------------------------------------------------------------------------- --------------------------------------------------------------------------- SPECint95 Benchmark Before After % diff 099.go 43.11 42.86 - 0.59% 124.m88ksim 30.64 0.00 -100.00% 126.gcc 33.48 31.72 - 5.26% 129.compress 22.39 23.34 + 4.26% 130.li 32.30 30.40 - 5.88% 132.ijpeg 34.91 30.85 - 11.62% 134.perl 46.71 0.00 -100.00% 147.vortex 32.95 32.14 - 2.45% mean 33.84 31.39 - 7.23% --------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 20:06 ` Diego Novillo @ 2003-02-27 20:36 ` Steven Bosscher 2003-02-27 21:17 ` tree-ssa status Zack Weinberg 2003-02-27 21:23 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo 2003-02-27 20:45 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin 1 sibling, 2 replies; 33+ messages in thread From: Steven Bosscher @ 2003-02-27 20:36 UTC (permalink / raw) To: Diego Novillo; +Cc: gcc Op do 27-02-2003, om 20:46 schreef Diego Novillo: > On Thu, 27 Feb 2003, Steven Bosscher wrote: > > > There is a difference between things that "work fine" and things that > > "are ready". > > > Yup. > > > What, for example, is the compile time performance on the branch? SPEC > > results compared to mainline? If I correctly interpret Diego's SPEC > > tester results, it does not quite look like it's there yet. > > > Yes, we are not quite there yet. > > Since you brought it up, I got curious and gathered a few stats. > This is how the branch looks today in terms of compile time > compared with mainline: > > - All times are in seconds. > - All the compilers are built with ENABLE_CHECKING. > - Bootstrap times are for all default languages (C, C++, Java, > Fortran and ObjC). > > ------------------------------------------------------------------------ > Build times mainline tree-ssa Difference > ------------------------------------------------------------------------ > SPEC95 375 418 +11% > SPEC2000 517 586 +13% > Bootstrap 3247 3588 +10% > ------------------------------------------------------------------------ Mwah, that is not as bad as I thought. And comparing the branch to the mainline just like that is not really fair IMO (which is why I didn't do it), because eventually there should be optimizations we do on RTL that have virtually no effect while they may ruin compile time. If you take that into account it really doesn't look that bad. It would be interesting to know the total time spent in the RTL optimizers for mainline and branch... > In terms of runtime performance, the branch is ~3% slower on > SPECint2000 and ~11% slower on SPECfp2000. There's a couple of > individual tests where the branch is marginally faster than > mainline. The branch also fails to compile vpr, eon and perlbmk. Ooof ouch, 11%! The worst slowdown is obviously in 179.art. What's so special about it that makes the branch twice as slow? > - Investigate what can be modified/removed/simplified in the RTL > backend due to the different "flavour" of RTL that the tree > optimizers emit. I suspect that we could get a big compile > time boost if we could get rid of unnecessary work in the > backend. We could also find that we need some more additional > passes at the tree level that we still haven't thought of. Define "unnecessary"... Ideally we'd have all front ends use the tree infrastructure. That would allow us to blow away big chunks of RTL optimizers, and indeed one would expect to see a potentially significant win in compile time from that. But it seems that we'll be stuck with at least one front end that will probably never be able to do function-at-a-time compilation (g77). It was said here once that Ada will do function-at-a-time?? > So, it's a lot of work. Will it be ready for 3.5's stage1? I > don't know. Particularly if the list of requirements grows > bigger. The integration work will also be interesting. I diff'd > mainline and the branch a few days ago: > > 307 files changed, 80994 insertions(+), 4342 deletions(-) How much of those insertions are in new files? > > (I can send you a patch to update the web page iyl. It still claims > > you're using FUD chains :-) > > > Oops. Patches appreciated! I'll make one. Greetz Steven ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 20:36 ` Steven Bosscher @ 2003-02-27 21:17 ` Zack Weinberg 2003-02-27 21:23 ` Paul Brook 2003-02-27 22:47 ` Steven Bosscher 2003-02-27 21:23 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo 1 sibling, 2 replies; 33+ messages in thread From: Zack Weinberg @ 2003-02-27 21:17 UTC (permalink / raw) To: Steven Bosscher; +Cc: Diego Novillo, gcc Steven Bosscher <s.bosscher@student.tudelft.nl> writes: > But it seems that we'll be stuck with at least one front end that will > probably never be able to do function-at-a-time compilation (g77). Isn't g95 targeted for the 3.5 time frame as well? If so, we ought to be able to scrap g77 entirely. zw ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 21:17 ` tree-ssa status Zack Weinberg @ 2003-02-27 21:23 ` Paul Brook 2003-02-28 21:15 ` Toon Moene 2003-02-27 22:47 ` Steven Bosscher 1 sibling, 1 reply; 33+ messages in thread From: Paul Brook @ 2003-02-27 21:23 UTC (permalink / raw) To: Zack Weinberg, Steven Bosscher; +Cc: Diego Novillo, gcc On Thursday 27 February 2003 8:57 pm, Zack Weinberg wrote: > Steven Bosscher <s.bosscher@student.tudelft.nl> writes: > > But it seems that we'll be stuck with at least one front end that will > > probably never be able to do function-at-a-time compilation (g77). > > Isn't g95 targeted for the 3.5 time frame as well? If so, we ought to > be able to scrap g77 entirely. While g95 may be nominally targeted for this timeframe, I very much doubt it will be considered an acceptable replacement for g77 until some time afterwards. A working g95 and a g95 which is as good as (or better than) g77 for fortran 77 code are two totally different things. Paul Brook ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 21:23 ` Paul Brook @ 2003-02-28 21:15 ` Toon Moene 0 siblings, 0 replies; 33+ messages in thread From: Toon Moene @ 2003-02-28 21:15 UTC (permalink / raw) To: Paul Brook; +Cc: Zack Weinberg, Steven Bosscher, Diego Novillo, gcc Paul Brook wrote: > On Thursday 27 February 2003 8:57 pm, Zack Weinberg wrote: > >>Steven Bosscher <s.bosscher@student.tudelft.nl> writes: >> >>>But it seems that we'll be stuck with at least one front end that will >>>probably never be able to do function-at-a-time compilation (g77). >> >>Isn't g95 targeted for the 3.5 time frame as well? If so, we ought to >>be able to scrap g77 entirely. > While g95 may be nominally targeted for this timeframe, I very much doubt it > will be considered an acceptable replacement for g77 until some time > afterwards. A working g95 and a g95 which is as good as (or better than) > g77 for fortran 77 code are two totally different things. For the benefit of understanding this issue by those not intimately involved in Fortran business: Most closed-source Fortran compilers consisted as two separate executables for about a decade before the Fortran 90/95 compiler was "good enough" at the Fortran 77 stuff to be the only offering. We hope to work faster than that, but don't hold your breath. -- Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html GNU Fortran 95: http://gcc-g95.sourceforge.net/ (under construction) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 21:17 ` tree-ssa status Zack Weinberg 2003-02-27 21:23 ` Paul Brook @ 2003-02-27 22:47 ` Steven Bosscher 1 sibling, 0 replies; 33+ messages in thread From: Steven Bosscher @ 2003-02-27 22:47 UTC (permalink / raw) To: Zack Weinberg; +Cc: Diego Novillo, gcc, Paul Brook Op do 27-02-2003, om 21:57 schreef Zack Weinberg: > Steven Bosscher <s.bosscher@student.tudelft.nl> writes: > > > But it seems that we'll be stuck with at least one front end that will > > probably never be able to do function-at-a-time compilation (g77). > > Isn't g95 targeted for the 3.5 time frame as well? Yea, right. I'd wish! :-\ Right now we can't even generate code for all of F77. And that is supposed to be the *easy* part. Nope, there is still a long way to go. We're getting closer to being able to generate code for the whole language, but the code quality is not all that great. And that is just the compiler side of things. The library is another issue that needs to be addressed, and while bits and pieces are in place, it will still take a huge effort to finish it. Greetz Steven ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 20:36 ` Steven Bosscher 2003-02-27 21:17 ` tree-ssa status Zack Weinberg @ 2003-02-27 21:23 ` Diego Novillo 2003-02-27 23:09 ` law 1 sibling, 1 reply; 33+ messages in thread From: Diego Novillo @ 2003-02-27 21:23 UTC (permalink / raw) To: Steven Bosscher; +Cc: gcc On Thu, 2003-02-27 at 15:31, Steven Bosscher wrote: > It would be interesting to know the total time spent in the RTL > optimizers for mainline and branch... > Yes, it would. This is something that Jeff Law has started to do recently. Most of the recent compile time improvements come from profiling the implementation and finding obvious hot spots. We will be doing lots of that in the coming weeks/months. > The worst slowdown is obviously in 179.art. What's so special about it > that makes the branch twice as slow? > No idea yet. This is part of what still needs to be done. > > - Investigate what can be modified/removed/simplified in the RTL > > backend due to the different "flavour" of RTL that the tree > > optimizers emit. I suspect that we could get a big compile > > time boost if we could get rid of unnecessary work in the > > backend. We could also find that we need some more additional > > passes at the tree level that we still haven't thought of. > > Define "unnecessary"... > If we can make simplifying assumptions in the RTL optimizers, we could make them run faster. This of course would need to be predicated. As you point out, not all the front ends go through tree-ssa. It is still unclear to me whether this is impossible or merely difficult. > > So, it's a lot of work. Will it be ready for 3.5's stage1? I > > don't know. Particularly if the list of requirements grows > > bigger. The integration work will also be interesting. I diff'd > > mainline and the branch a few days ago: > > > > 307 files changed, 80994 insertions(+), 4342 deletions(-) > > How much of those insertions are in new files? > Good point. About 70000. Diego. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 21:23 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo @ 2003-02-27 23:09 ` law 2003-02-27 23:36 ` tree-ssa status Zack Weinberg 0 siblings, 1 reply; 33+ messages in thread From: law @ 2003-02-27 23:09 UTC (permalink / raw) To: Diego Novillo; +Cc: Steven Bosscher, gcc In message <1046379746.3100.23.camel@shadowfax>, Diego Novillo writes: >On Thu, 2003-02-27 at 15:31, Steven Bosscher wrote: > >> It would be interesting to know the total time spent in the RTL >> optimizers for mainline and branch... >> >Yes, it would. This is something that Jeff Law has started to do >recently. Most of the recent compile time improvements come from >profiling the implementation and finding obvious hot spots. We will be >doing lots of that in the coming weeks/months. More correctly, I've been measuring time in the SSA path independently from the rest of the compiler. This makes it a lot easier to see where the SSA path is wasting time (ie, it isn't hidden by something like cse or gcse going crazy). I haven't tried to measure and compare the time the two branches spend in the RTL optimizers. We'll probably be to that point in the not too terribly distant future. >> The worst slowdown is obviously in 179.art. What's so special about it >> that makes the branch twice as slow? >> >No idea yet. This is part of what still needs to be done. Right. Of focus so far has been on getting the underlying infrastucture in place and more recently making sure that infrastructure runs reasonably quickly. We haven't started looking at the generated code yet. It won't make a lot of sense to do that until we have a real translator out of SSA. >> Define "unnecessary"... >> >If we can make simplifying assumptions in the RTL optimizers, we could >make them run faster. This of course would need to be predicated. As >you point out, not all the front ends go through tree-ssa. It is still >unclear to me whether this is impossible or merely difficult. One canonical example I use is null pointer check elimination which can be completely subsumed by a tree-ssa version. We already know that null pointer check elimination is relatively slow and memory intensive (that's why it blocks the bitvectors rather than doing everything in parallel). The other canonical example is all the path following performed by cse1. I believe there is an excellent chance we'll be able to have cse1 do a block-local CSE once the tree-ssa code is doing some basic value numbering and copy prop on the dominator tree. This code is also known to be a major cpu hog. In both cases the SSA equivalent optimizations can be made bloody fast and memory efficient. >> > So, it's a lot of work. Will it be ready for 3.5's stage1? I >> > don't know. Particularly if the list of requirements grows >> > bigger. The integration work will also be interesting. I diff'd >> > mainline and the branch a few days ago: >> > >> > 307 files changed, 80994 insertions(+), 4342 deletions(-) >> >> How much of those insertions are in new files? >> >Good point. About 70000. Also keep in mind the change to carry file/line information in a common location touched a lot of code in basically mindless ways. Jeff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 23:09 ` law @ 2003-02-27 23:36 ` Zack Weinberg 2003-02-27 23:46 ` law 0 siblings, 1 reply; 33+ messages in thread From: Zack Weinberg @ 2003-02-27 23:36 UTC (permalink / raw) To: law; +Cc: Diego Novillo, Steven Bosscher, gcc law@redhat.com writes: > Also keep in mind the change to carry file/line information in a common > location touched a lot of code in basically mindless ways. That might be a candidate for a separate merge to mainline (he says, not having looked at the code at all). zw ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status 2003-02-27 23:36 ` tree-ssa status Zack Weinberg @ 2003-02-27 23:46 ` law 0 siblings, 0 replies; 33+ messages in thread From: law @ 2003-02-27 23:46 UTC (permalink / raw) To: Zack Weinberg; +Cc: Diego Novillo, Steven Bosscher, gcc In message <87lm015pgo.fsf@egil.codesourcery.com>, Zack Weinberg writes: >law@redhat.com writes: > >> Also keep in mind the change to carry file/line information in a common >> location touched a lot of code in basically mindless ways. > >That might be a candidate for a separate merge to mainline (he says, >not having looked at the code at all). I wouldn't recommend it. The tree-ssa code has very different characteristics in terms of how file/line information is carried around when compared to the mainline sources. As a result, the tradeoffs for which scheme is more memory efficient are very different. I did the algebra for the mainline a while back and concluded that the scheme used by tree-ssa won't make sense for the mainline if/until tree-ssa is the mainline. Jeff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 20:06 ` Diego Novillo 2003-02-27 20:36 ` Steven Bosscher @ 2003-02-27 20:45 ` Daniel Berlin 2003-02-27 20:57 ` Diego Novillo 1 sibling, 1 reply; 33+ messages in thread From: Daniel Berlin @ 2003-02-27 20:45 UTC (permalink / raw) To: Diego Novillo; +Cc: Steven Bosscher, gcc On Thursday, February 27, 2003, at 02:46 PM, Diego Novillo wrote: > On Thu, 27 Feb 2003, Steven Bosscher wrote: > >> There is a difference between things that "work fine" and things that >> "are ready". >> > Yup. > >> What, for example, is the compile time performance on the branch? >> SPEC >> results compared to mainline? If I correctly interpret Diego's SPEC >> tester results, it does not quite look like it's there yet. >> > Yes, we are not quite there yet. > > Since you brought it up, I got curious and gathered a few stats. > This is how the branch looks today in terms of compile time > compared with mainline: > > - All times are in seconds. > - All the compilers are built with ENABLE_CHECKING. > - Bootstrap times are for all default languages (C, C++, Java, > Fortran and ObjC). Is this still without copy propagation, or is that in a tree you used here? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: tree-ssa status (was: Re: Dropping of old loop optimizer) 2003-02-27 20:45 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin @ 2003-02-27 20:57 ` Diego Novillo 0 siblings, 0 replies; 33+ messages in thread From: Diego Novillo @ 2003-02-27 20:57 UTC (permalink / raw) To: Daniel Berlin; +Cc: Steven Bosscher, gcc On Thu, 2003-02-27 at 15:37, Daniel Berlin wrote: > > On Thu, 27 Feb 2003, Steven Bosscher wrote: > > > >> There is a difference between things that "work fine" and things that > >> "are ready". > >> > > Yup. > > > >> What, for example, is the compile time performance on the branch? > >> SPEC > >> results compared to mainline? If I correctly interpret Diego's SPEC > >> tester results, it does not quite look like it's there yet. > >> > > Yes, we are not quite there yet. > > > > Since you brought it up, I got curious and gathered a few stats. > > This is how the branch looks today in terms of compile time > > compared with mainline: > > > > - All times are in seconds. > > - All the compilers are built with ENABLE_CHECKING. > > - Bootstrap times are for all default languages (C, C++, Java, > > Fortran and ObjC). > > Is this still without copy propagation, or is that in a tree you used > here? > We still don't have an implementation of copy propagation in the branch. Copyprop is one of the transformations that needs the real unSSA conversion. All the figures I posted are taken with pristine copies of tree-ssa and mainline as of 2003-02-26. Diego. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 18:30 ` Pop Sébastian 2003-02-27 19:15 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher @ 2003-02-27 19:59 ` Daniel Berlin 2003-02-27 20:49 ` Michael Matz 2003-02-28 10:46 ` Jan Hubicka 1 sibling, 2 replies; 33+ messages in thread From: Daniel Berlin @ 2003-02-27 19:59 UTC (permalink / raw) To: Pop Sébastian Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh On Thursday, February 27, 2003, at 12:13 PM, Pop Sébastian wrote: > On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: >> >> The problem I see is that rtl ssa is broken (or never worked?); but > Last time I had a close look at rtl was about two years ago :-) > Dan is the expert to ask about the status of SSA at rtl level (he > adapted > the CCP code from rtl to tree-ssa). The problem with RTL SSA is hard registers and subregs. There are hard registers assigned in certain cases during initial RTL generation. These can't be renamed, obviously. You run into difficulties with subregs as well (in relation to liveness and renaming). Thus, one has the following options: 1. Ignore them while creating SSA, and make sure optimizers don't touch them. (not clean) 2. Perform magic to save and restore the pieces of code using them. (not clean, not easy) 3. Don't generate hard registers until after SSA is done with (not all that easy, but could be done), and don't ever generate subregs (not really possible without severely rearchitecting RTL). The reason we need subregs is because of how we do registers. Most compilers do the reverse of what we do in relation to registers. We have a single operator, like SET, and multiple register modes/sizes that can be operands. So to get a full register copy, one simply does something like (set (reg:SI 50) (reg:SI 52)) And to get a piece of a register, one needs a subregister operator. like (set (reg:SI 50) (subreg:SI (reg:DI 51) 0)) Most compilers have multiple operators (one for each size), and no register sizes. So to get a full register copy, they do: (set:DIDI (reg 50) (reg 51)) and to get a piece of a register, they do (set:SIDI (reg 50) (reg 51)) Now consider trying to set a piece of a register we would do: (set (subreg:SI (reg:DI 51)) (reg:SI 50)) which is very hard to think about how to rename into SSA (because it's a partial assignment, and the subreg itself has no name) while they would do: (set:DISI (reg 51) (reg 50)) which is still a full def, and easy to rename. Get it? > >> perhaps we could do this kind of optimizations just on trees (that >> unfortunately also aren't ready for serious usage yet, right?)? >> > CCP (Conditional constant propagation) and > DCE (Dead code elimination) > seem to work fine on tree-SSA. > and PRE *will* work fine just as soon as Andrew finishes and posts the insertion code. --Dan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 19:59 ` Dropping of old loop optimizer Daniel Berlin @ 2003-02-27 20:49 ` Michael Matz 2003-02-28 10:46 ` Jan Hubicka 1 sibling, 0 replies; 33+ messages in thread From: Michael Matz @ 2003-02-27 20:49 UTC (permalink / raw) To: Daniel Berlin Cc: Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh Hi, On Thu, 27 Feb 2003, Daniel Berlin wrote: > Now consider trying to set a piece of a register > > we would do: > > (set (subreg:SI (reg:DI 51)) (reg:SI 50)) > > while they would do: > (set:DISI (reg 51) (reg 50)) > > which is still a full def, and easy to rename. But those compiler would need to do magic for other than the lowest part, i.e. the difference between (subreg:SI (reg:DI) 0) and (subreg:SI (reg:DI) 4). And especially when setting a full DImode reg really piecewise (i.e. when the set's indeed are just partial and do not clobber the whole underlying reg). They basically need to set two SImode regs, and then do a parallel copy of those two regs into one DImode reg (one in the low, one in the high part), and then rely on something like coalescing and the reg allocator to allocate one of the SImode regs to the low and the other to the high part of an register (i.e. a limited form of bitwidth aware allocation). It's not clear to me, if one approach is really better than the other, they both have their issues. Ciao, Michael. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-27 19:59 ` Dropping of old loop optimizer Daniel Berlin 2003-02-27 20:49 ` Michael Matz @ 2003-02-28 10:46 ` Jan Hubicka 2003-02-28 13:15 ` Pop Sébastian 2003-02-28 15:29 ` law 1 sibling, 2 replies; 33+ messages in thread From: Jan Hubicka @ 2003-02-28 10:46 UTC (permalink / raw) To: Daniel Berlin Cc: Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh > > On Thursday, February 27, 2003, at 12:13 PM, Pop SĂŠbastian wrote: > > >On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: > >> > >>The problem I see is that rtl ssa is broken (or never worked?); but > >Last time I had a close look at rtl was about two years ago :-) > >Dan is the expert to ask about the status of SSA at rtl level (he > >adapted > >the CCP code from rtl to tree-ssa). > The problem with RTL SSA is hard registers and subregs. > There are hard registers assigned in certain cases during initial RTL > generation. > These can't be renamed, obviously. The tree-SSA represents the SSA form as datastructure on the top of the trees not modifying the trees themselves, right? Why it is not feasible for RTL (ie you won't need to rename the registers) Honza ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-28 10:46 ` Jan Hubicka @ 2003-02-28 13:15 ` Pop Sébastian 2003-02-28 15:29 ` law 1 sibling, 0 replies; 33+ messages in thread From: Pop Sébastian @ 2003-02-28 13:15 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc On Fri, Feb 28, 2003 at 10:54:45AM +0100, Jan Hubicka wrote: > > The tree-SSA represents the SSA form as datastructure on the top of the > trees not modifying the trees themselves, right? Trees are modified and contain SSA nodes: http://gcc.gnu.org/ml/gcc-patches/2003-01/msg02182.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-28 10:46 ` Jan Hubicka 2003-02-28 13:15 ` Pop Sébastian @ 2003-02-28 15:29 ` law 2003-02-28 21:56 ` Daniel Berlin 1 sibling, 1 reply; 33+ messages in thread From: law @ 2003-02-28 15:29 UTC (permalink / raw) To: Jan Hubicka Cc: Daniel Berlin, Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, dan In message <20030228095445.GD5431@kam.mff.cuni.cz>, Jan Hubicka writes: >> >> On Thursday, February 27, 2003, at 12:13 PM, Pop Sébastian wrote: >> >> >On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: >> >> >> >>The problem I see is that rtl ssa is broken (or never worked?); but >> >Last time I had a close look at rtl was about two years ago :-) >> >Dan is the expert to ask about the status of SSA at rtl level (he >> >adapted >> >the CCP code from rtl to tree-ssa). >> The problem with RTL SSA is hard registers and subregs. >> There are hard registers assigned in certain cases during initial RTL >> generation. >> These can't be renamed, obviously. > >The tree-SSA represents the SSA form as datastructure on the top of the >trees not modifying the trees themselves, right? Why it is not feasible >for RTL (ie you won't need to rename the registers) hard-regs and subregs really aren't really the problem with rtl-ssa, they're both solvable. The first by ignoring them, the second with some annoying, but not terribly difficult work in the ssa->normal translator. What is far more of a problem for rtl-ssa is the lack of any kind of generic level rtl. The tree-SSA branch using a rewriting SSA form now. Jeff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Dropping of old loop optimizer 2003-02-28 15:29 ` law @ 2003-02-28 21:56 ` Daniel Berlin 0 siblings, 0 replies; 33+ messages in thread From: Daniel Berlin @ 2003-02-28 21:56 UTC (permalink / raw) To: law Cc: Jan Hubicka, Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, dan On Friday, February 28, 2003, at 09:29 AM, law@redhat.com wrote: > In message <20030228095445.GD5431@kam.mff.cuni.cz>, Jan Hubicka writes: >>> >>> On Thursday, February 27, 2003, at 12:13 PM, Pop Sébastian wrote: >>> >>>> On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote: >>>>> >>>>> The problem I see is that rtl ssa is broken (or never worked?); but >>>> Last time I had a close look at rtl was about two years ago :-) >>>> Dan is the expert to ask about the status of SSA at rtl level (he >>>> adapted >>>> the CCP code from rtl to tree-ssa). >>> The problem with RTL SSA is hard registers and subregs. >>> There are hard registers assigned in certain cases during initial RTL >>> generation. >>> These can't be renamed, obviously. >> >> The tree-SSA represents the SSA form as datastructure on the top of >> the >> trees not modifying the trees themselves, right? Why it is not >> feasible >> for RTL (ie you won't need to rename the registers) > hard-regs and subregs really aren't really the problem with rtl-ssa, > they're both solvable. however, last i looked, your solution was going to be to just not do an ssa path when subregs existed And if they were all that easy to solve, we would be doing register allocation on an SSA form, like every other compiler in existence. > The first by ignoring them, the second with > some annoying, but not terribly difficult work in the ssa->normal > translator. Both of which suck, to be honest, and *are* difficult enough that nobody has *done* it yet. > > What is far more of a problem for rtl-ssa is the lack of any kind of > generic level rtl. > With what properties, exactly? > The tree-SSA branch using a rewriting SSA form now. > > Jeff > ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2003-02-28 20:58 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-02-27 1:04 Dropping of old loop optimizer Zdenek Dvorak 2003-02-27 1:12 ` tm_gccmail 2003-02-27 2:27 ` Zdenek Dvorak 2003-02-27 4:07 ` tm_gccmail 2003-02-27 4:24 ` tm_gccmail 2003-02-27 11:47 ` Zdenek Dvorak 2003-02-27 15:48 ` Pop Sébastian 2003-02-27 16:06 ` Daniel Berlin 2003-02-27 16:32 ` Pop Sébastian 2003-02-27 19:43 ` Daniel Berlin 2003-02-28 13:06 ` Pop Sébastian 2003-02-27 17:13 ` Pop Sébastian 2003-02-27 16:59 ` Zdenek Dvorak 2003-02-27 18:30 ` Pop Sébastian 2003-02-27 19:15 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher 2003-02-27 20:06 ` Diego Novillo 2003-02-27 20:36 ` Steven Bosscher 2003-02-27 21:17 ` tree-ssa status Zack Weinberg 2003-02-27 21:23 ` Paul Brook 2003-02-28 21:15 ` Toon Moene 2003-02-27 22:47 ` Steven Bosscher 2003-02-27 21:23 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo 2003-02-27 23:09 ` law 2003-02-27 23:36 ` tree-ssa status Zack Weinberg 2003-02-27 23:46 ` law 2003-02-27 20:45 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin 2003-02-27 20:57 ` Diego Novillo 2003-02-27 19:59 ` Dropping of old loop optimizer Daniel Berlin 2003-02-27 20:49 ` Michael Matz 2003-02-28 10:46 ` Jan Hubicka 2003-02-28 13:15 ` Pop Sébastian 2003-02-28 15:29 ` law 2003-02-28 21:56 ` Daniel Berlin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).