From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-65960-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16382 invoked by alias); 8 Jan 2003 18:16:25 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 16360 invoked from network); 8 Jan 2003 18:16:16 -0000
Received: from unknown (HELO mail.goquest.com) (12.18.108.6)
  by 209.249.29.67 with SMTP; 8 Jan 2003 18:16:16 -0000
Received: (qmail 2184 invoked by uid 0); 8 Jan 2003 18:15:10 -0000
Received: from mszick@goquest.com by mail.goquest.com by uid 502 with qmail-scanner-1.12 (spamassassin: 2.31. . Clear:. Processed in 1.462891 secs); 08 Jan 2003 18:15:10 -0000
Received: from unknown (HELO localhost.localdomain) (66.90.208.42)
  by mail.goquest.com with SMTP; 8 Jan 2003 18:15:08 -0000
Content-Type: text/plain;
  charset="iso-8859-1"
From: Michael S. Zick <mszick@goquest.com>
To: Andy Walker <ja_walker@earthlink.net>,
 <tm_gccmail@mail.kloo.net>
Subject: Re: An unusual Performance approach using Synthetic registers
Date: Wed, 08 Jan 2003 19:29:00 -0000
Cc: gcc@gcc.gnu.org
References: <Pine.LNX.4.21.0301071008500.18627-100000@mail.kloo.net> <E18W8qS-0007UG-00@avocet.mail.pas.earthlink.net>
In-Reply-To: <E18W8qS-0007UG-00@avocet.mail.pas.earthlink.net>
MIME-Version: 1.0
Message-Id: <03010812102700.00905@localhost.localdomain>
Content-Transfer-Encoding: 8bit
X-SW-Source: 2003-01/txt/msg00441.txt.bz2

On Tuesday 07 January 2003 11:35 pm, Andy Walker wrote:
> On Tuesday 07 January 2003 12:16 pm, tm_gccmail@mail.kloo.net wrote:
> <snip>
> <re: XCHG, exchange, instruction>
>
> > Now that I think about it, it's even worse on the Pentium/Pentium MMX
> > than I initially thought.
> >
> > There's two instruction pipelines on the Pentium: the U pipe and the V
> > pipe. The U pipe can execute all the instructions, but the V pipe can
> > only execute simple instructions.
>
> <snip>
>
> > Toshi
>
> I will take your good advice and not use XCHG as a performance enhancing
> option.
>
> Andy
Andy,

I do not make any claims of this being anything other than a WAFG...

It wasn't used as a numerical measure, just "==", "<", ">" to
determine an order among alternative code sequences.

But I used it as my guide in the past and is why I suggested XCHG.

Why:
If user wanted "Best Size" I dropped the "C" term
if user wanted "Best Speed" I dropped the "D" term
Otherwise, just use the diagonal of a cube.

How:
Scaled everything so it could be done with integer math.

Legend:
B == Buss Cycles
C == Clock Cycles
S == Instruction Size
D == (Instruction Size DIV D-Cache Size)
Cost == SQRT(256*( B*B + C*C + D*D))

Presumes:
1) Write to Stack meets the "Write Before Read" requirement
So the first stack read does not generate a buss cycle.
2) If temporary is required, use EAX 
3) If EAX not available, spill/restore with push/pop
4) Newer processors will never be worse than 80386
5) D-Cache line size 64 bytes

Notes:
Case 1 leaves a buss write pending
Follow with a Reg <-> Reg to hide write cycle

Case 2 the "load/store" version, needs register
Follow with another Reg <-> Reg if available

Case 3 leaves a buss write pending
Case 4 puts other Reg <-> Reg ops to hide buss write

PATH____________B_|_C_|_S_|__D__|__Cost

Case 1 == Cost 80
xchg ebx, [esp+16]__0_|_5_|_3_|_0.05_|___80
With a pending Buss Cycle so,
Reg <-> Reg pad here

Case 2 == Cost 129					
mov  eax, [esp+16]__0_|_4_|_3_|_0.05	
mov  [esp+16], ebx__1_|_2_|_3_|_0.05	
mov  ebx, eax______0_|_2_|_2_|_0.03	
- - - - - - -
_________________1_|_8_|_8_|_0.13_|__129

Case 3 == Cost 229					
push eax_________1_|_2_|_1_|_0.02	
mov  eax, [esp+20]_0_|_4_|_3_|_0.05	
mov  [esp+20], ebx_1_|_2_|_3_|_0.05	
mov  ebx, eax_____0_|_2_|_2_|_0.03	
pop  eax_________1_|_4_|_1_|_0.02	
- - - - - - - 
_______________3_|_14_|_10_|_0.16_|__229
	
Case 4 == Cost 226				
push eax_________1_|_2_|_1_|_0.02	
mov  eax, [esp+20]_0_|_4_|_3_|_0.05	
mov  [esp+20], ebx_0_|_2_|_3_|_0.05	
mov  ebx, eax_____0_|_2_|_2_|_0.03
> > Reg <-> Reg pad here	
pop  eax_________1_|_4_|_1_|_0.02	
- - - - - - - 
_______________2_|_14_|_10_|_0.16_|__226