From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernd Schmidt To: egcs@cygnus.com Subject: Re: Reload patch to improve 386 code Date: Tue, 19 Aug 1997 08:08:21 -0000 Message-ID: <199708190621.IAA04731@haegar.physiol.med.tu-muenchen.de> In-reply-to: 199708181855.OAA03711@jenolan.rutgers.edu X-SW-Source: 1997-08/0135.html Message-ID: <19970819080821.S1vj85E23h4XVB_8oZ-M_HFWyV5ba-M9yv10QdkxNmU@z> > Before this leaves my head, I wanted to point something out which > you've reminded me of. When the scheduler (this applies to both the > original and Haifa versions equally) becomes aggressive, it produces a > large number of reloads in certain situations. The idea of running sched before reload seems to be to improve code like this: move mem1 => pseudo1 move pseudo1 => mem2 move mem3 => pseudo2 move pseudo2 => mem4 move mem5 => pseudo3 move pseudo3 => mem6 If this is left as it stands, register allocation will most likely allocate the pseudo to the same hard register. This means the post-reload sched pass can't do anything with it, and the CPU can't either because there is no parallelism in the code (well, at least the Pentium can't). If sched modifies the above to look like this move mem1 => pseudo1 move mem3 => pseudo2 move mem5 => pseudo3 move pseudo1 => mem2 move pseudo2 => mem4 move pseudo3 => mem6 you suddenly have two blocks of three independent instructions which could run in parallel. However, this will lose badly once you don't have three instructions of that kind, but a hundred (since your average CPU doesn't have a hundred hard registers). Another approach I've been thinking about is to add code that analyzes code like this after reload move mem1 => hardreg1 move hardreg1 => mem2 move mem3 => hardreg1 move hardreg1 => mem4 move mem5 => hardreg1 move hardreg1 => mem6 and tries to make it use as many independent hard registers as possible. That would make the scheduling opportunities available without the risk of over-scheduling before reload. I don't know how feasible this is. Bernd