public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* An unusual Performance approach using Synthetic registers, and a request for guidance.
@ 2002-12-27  5:04 Andy Walker
  2002-12-27 10:55 ` Michael S. Zick
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Walker @ 2002-12-27  5:04 UTC (permalink / raw)
  To: GCC Developer's List

I am working on a somewhat different approach for gcc for the 'x86 machines.  
My assembler experience is from application development on IBM Mainframes - 
the 360/370/390 stuff.  

I find the thought of only 8, or really, 6 general purpose registers being 
available for the compiler to be just rediculous.  On the s/390 equipment, we 
had 16 GP registers, and that was half what we really needed.  

I am modifying gcc to add Synthetic registers.  If I am mistaken, this is a 
big waste of time.  If I am correct, (meaning "really really Lucky") then 
this may provide a significant increase in speed for executables.

The philosophy is straightforward.  I believe that gcc could handle 
intermediate results more efficiently.  Register based values are now handled 
very efficiently.  Memory based values are handled more-or-less efficiently.  
Because there is no specific or formal approach for intermediate results, 
they are not handled efficiently.  I believe that for long functions, more 
than a hundred lines of code, gcc compiled code spends much of its time 
recreating intermediate results again and again.  This is because there is no 
place to store intermediate results, other than in the registers, and so the 
results are discarded and recreated.  

When I was a boy, a register had a physical presence.  It was as big as an 
office desk, and, with power supply, weighed as much as a car.  Now, it is a 
"state" of the machine, and one would be hard pressed to point to the exact 
spot where the register exists, no matter how good the microscope.  

A Synthetic register is a Word of memory, defined to the compiler as a 
register.  If my readings in machine architecture are correct, then for 
modern machines, L1 cache access is as fast, or nearly as fast, as register 
access.  Because the Synthetic registers will be frequently accessed, they 
will remain in L1 cache, and give register-like performance.  The compiler is 
"told" that the Synthetic registers must be accessed with special 
instructions.  In reality, those special instructions are mostly-specified 
"modrm" instructions.  I am using "modrm" instructions to minimize 
instruction length and to decrease complexity. 

My first attempt is to add 32 general purpose Synthetic registers.  The 
contiguous block of Sythetic registers is aligned on whatever block boundary 
is used by the L1 cache.  There must be real memory somewhere from which the 
L1 cache will be drawn.  I am implementing it as an extra, aligned, block of 
128 bytes in the frame.  This means that every function will have its own 
block of Synthetic registers and there will be no need to save or restore 
Synthetic registers.  If your call stack is 10 calls deep, you get 320 
Synthetic registers.  Also, a function compiled with Synthetic registers can 
be linked to, and called from, a function compiled without Synthetic 
registers, with no additional complexity.

Once this part is working, it may be worthwhile to look into varying the 
number of Synthetic registers, depending upon the needs of the compiler for a 
particular function.  I am also quite curious about the value of Synthetic 
floating-point registers. and have done some partial coding in anticipation 
of their use.

There is an additional price for Synthetic registers.  I could not figure out 
a simple way to use the frame pointer, so I dragooned the "C" register, 
register number 2, to be used as the base register for the modrm 
instructions.  Time will tell if this is Dumb, or Dumber.  I am absolutely 
convinced there is a better way.  I just do not know yet what it is. 

I have set up the allocation order so that the Synthetic registers are 
allocated before the general purpose registers.  This way, general purpose 
registers are reserved for situations where Synthetic registers are not 
appropriate.

Synthetic registers are an imperfect answer.  They are limited to binary-type 
instructions.  They cannot be used as base or index registers.  They cannot 
be directly copied to memory.  Describing the instructions to the compiler as 
register instructions when in fact data is written into the L1 cache may be 
too cute.  I fear that the scheduler will not see the difference, and the 
result will be frequent instruction stalls, something that I hoped this 
approach would reduce.

So much for the Big Talk.  Obviously, if this were all working now, I would 
not be asking for help, instead, I would be announcing where to get a copy of 
the modified code.  

I have many questions, mostly of a trivial nature, and would greatly 
appreciate any help.

		Andrew Walker

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: An unusual Performance approach using Synthetic registers, and a request for guidance.
@ 2002-12-27  6:48 Bonzini
  2002-12-27 20:07 ` Dave Hudson
  2002-12-29  1:57 ` Andy Walker
  0 siblings, 2 replies; 14+ messages in thread
From: Bonzini @ 2002-12-27  6:48 UTC (permalink / raw)
  To: gcc; +Cc: ja_walker

That's not unusual.  If one had, say, to write a back-end for the 6502, one
might think of representing the zero-page as caller-save registers, and hide
the weeny three registers A,X,Y from the middle-end (using them only
internally, or very near to this).  I also hope that because of the good L1
coherency of the stack, this might pay well on the x86 as well.  The only
problem might be in the lack of orthogonality in the x86 instruction set.

Go ahead, and good luck!

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-12-29 17:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-27  5:04 An unusual Performance approach using Synthetic registers, and a request for guidance Andy Walker
2002-12-27 10:55 ` Michael S. Zick
2002-12-27 11:44   ` Michael S. Zick
2002-12-28  4:58   ` Alexandre Oliva
2002-12-28  9:43     ` Michael S. Zick
2002-12-29  5:33       ` Andy Walker
2002-12-29  4:28     ` Andy Walker
2002-12-29  2:18   ` Andy Walker
2002-12-27  6:48 Bonzini
2002-12-27 20:07 ` Dave Hudson
2002-12-29  2:23   ` Andy Walker
2002-12-29  1:57 ` Andy Walker
2002-12-29  6:09   ` Bonzini
2002-12-29 12:04     ` Andy Walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).