From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Hubicka To: Frank Klemm Cc: Jan Hubicka , gcc@gcc.gnu.org Subject: Re: Multiplications on Pentium 4 Date: Sat, 08 Sep 2001 08:28:00 -0000 Message-id: <20010908172804.C8451@atrey.karlin.mff.cuni.cz> References: <20010827143032.C636@fuchs.offl.uni-jena.de> <20010827173025.F11402@atrey.karlin.mff.cuni.cz> <20010901202854.A7713@fuchs.offl.uni-jena.de> <20010902000000.C27182@atrey.karlin.mff.cuni.cz> <20010902024104.F7713@fuchs.offl.uni-jena.de> <20010903171717.E13574@atrey.karlin.mff.cuni.cz> <20010904215156.C438@fuchs.offl.uni-jena.de> <007001c1358e$6f53b6f0$7edd18ac@amr.corp.intel.com> <20010905134405.G15564@atrey.karlin.mff.cuni.cz> <20010907200403.A5281@fuchs.offl.uni-jena.de> X-SW-Source: 2001-09/msg00268.html > The Pentim 4 is so different from all other CPUs so I must write a special > Code Choice Generator. Some Examples: > > > imul: 14 Clocks Latency > shl: 4 Clocks Latency > lea (,,1) 0.5 Clocks Latency > lea (,,2) 4 Clocks Latency > lea (,,4) 4 Clocks Latency > lea (,,8) 4 Clocks Latency Actually lea for ,,2 can be rewriten to lea doing addition, that is faster. The rule is that shift has 4 cycle latency, while add 0.5. Lea is broken to trivial operations, so for your measurements you probably can ignore her existence. > add, sub, neg: 0.5 Clocks Latency > mov 0...0.5 Clocks Latency > > This generates fully different Code compared with i386...Pentium-III, > K5...Athlon. Agreed. Thats the poroblem. Other problem is that imul's and shift's extreme latency causes that we can benefit from replacing it by relativly many adds, but P4 is limited by trace cache. More adds, less cache space so this tradeoff needs to be controlled mainly by program's profile to find hot spots and aditionally by scheduler to reduce only critical paths trought BB. This is _extremly_ dificult to integrate to existing gcc model. I hope that Intel will realize that and do some funding to gcc development as good Pentium4 support will be tricky. Honza > > Optimizing code for size is easy. It's the same as for other CPUs. >