public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 18:13 Robert Dewar
  2003-01-05 20:15 ` Michael S. Zick
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 18:13 UTC (permalink / raw)
  To: dewar, lord, mszick; +Cc: gcc, ja_walker

> I have to strongly disagree with that observation.

I am talking very specifically here of the issue of taking register
renaming into account in register allocation and scheduling algorithms.
I certainly would be interested in any references you know of in this
area.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Sythetic registers: modrm/gas question.
@ 2003-01-07  3:13 Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-07  3:13 UTC (permalink / raw)
  To: ja_walker, mstump; +Cc: gcc

> You will want to learn how to ask GAS these questions.  Hint, type in 
> the assembly, run as foo.s, and then objdump (maybe -d) the result.  
> This is faster and generally more accurate than asking us.

Actually, I disagree with this. An experiment like this can only tell you
that in a particular situation, GAS does a particular thing. An experiment
cannot tell you the general rules, and the generated code must rely on
these rules, which need to be clearly stated. In fact I don't know whether
there are clearly stated rules for GAS, though in this particular case it
is obvious that GAS must minimize offsets, since if it did not, that would
have the status of being a clear bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 14:24 Robert Dewar
  2003-01-05 16:56 ` Michael S. Zick
  2003-01-05 22:33 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 14:24 UTC (permalink / raw)
  To: dewar, lord; +Cc: gcc, ja_walker

> Synthregs improve locality at a slight cost in code size.  I agree
> with ja_walker that the worthiness of that trade-off is an emprical
> question worth measuring.  I'd go beyond him by saying that in the 
> medium term future, the trade-off almost certainly wins often.

I disagree. In practice references to the local stack frame are nearly
always in cache. So there is just nothing much to improve. Actually it
is instruction cache that causes problems more often than data cache,
and making code larger is almost always a loss (that's way for example
-O3 frequently slows things down compared with -O2, and in the case of
Ada, where it is more normal to explicitly control inlining, -O3 is
almost always worse).

> x86 is a "rapidly" exploding range of physical architectures.

True, but the basic observation above holds for all of them, from the
486 onwards.

> "Yeah, well, until you've done the same you've no
> business talking about the SR proposal"

Gosh, you took the words out of my mouth :-) :-)

Seriously, I do think this needs to be discussed at a detailed level, which
is why I asked for an example. If we look at a specific example, then we
can fill in any details that are needed for the example, and not rely on
everyone having detailed knowledge of the architecture.

By the way, to repeat an idea that I think is definitely worth following up
on, register renaming plays a very important part in these architectures.
The Pentium-4 has something like 40 registers (can't remember exact number
and can't be bothered to look it up :-) Only 8 of these are directly 
nameable, but in a real program, many of these can get used.

I think it would be really interesting to study the issue of taking the
renaming into account when allocating registers. Consider

   mov ax, mem1
   add ax, 1
   mov mem1, ax
   mov ax, mem2
   add ax, 2
   mov mem2, ax

Classical optimization suggests

   mov ax, mem1
   mov bx, mem2
   add ax, 1
   add bx, 1
   mov mem1, ax
   mov mem2, bx

so that the two operations can be done in parallel in separate pipelines,
but in fact the first sequence is better, since register renaming will
allow the use of two separate pipelines, and you don't use up another
nameable register, which are what are in very short supply. 

The above is just an illustrative example of the kind of thing I am
talking about here. The devil is in the details (as with the SR proposal
itself), and you really have to know EXACTLY how the renaming works to
make sure you generate code that cooperates with it.

I am trying to interest a PhD student to do research in this area :-)

There is some work done, but not that much.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 13:25 Robert Dewar
  2003-01-06  5:36 ` Andy Walker
  2003-01-06 19:42 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 13:25 UTC (permalink / raw)
  To: dewar, lord; +Cc: gcc, ja_walker

> The question since it could almost certainly have been answered with a
> five-minute experiment (what, you don't have the necessary platform
> around?)

It's a little more fundamental than that. It is central to the design
of the x86 that the variable length offsets are optimized by the assembler.


> the synthregs proposal is argued for 
> at a much higher level than would require that close an examination
> of GCC's generated code

And that's the problem, It is being argued at too high a level, and creates
the impression of some principle that in fact I do not believe will show
up as improved code. You *DO* need to examine GCC's generated code to realize
that the code we generate today is essentially equivalent to the idea of
SR's. 


> Or are you good at simulating complex caches in your head based
> on disassembly listings and demand that we all are as well?

Well yes I am pretty good at it, but I definitely do NOT like to depend
on doing this, and I do NOT expect others to do it. What is needed are
some real examples so that we are NOT doing things in our head.

I certainly apologize if my answer seemed rude, but it was an attempt to
try to dig down into the details, since that is where the substance of
the argument will play out in a useful manner.

Actually I think the idea that the issue is in any sense related to the
operation of complex caches is completely bogus. In practice in typical
x86 code, nearly all EBP references with small offsets (references to
arguments or locals in the current stack frame) are in L1 cache. All our
data shows that, so if the idea of the SR proposal is somehow to improve
cache performance, that is unlikely to work out in practice. GCC already
does a pretty aggressive job of moving most references to the local
stack (either by referencing variables there, or spill locations).

Remember that the code you get for accessing a synthetic register is 
identical *in all respects* to the code you get for accessing local
variables and arguments. 

The question here seemed to imply that the fundamental idea behind the
SR proposal was to take advantage of the 3-byte MODRM format for efficient
access to SR's, coupled with a belief that GCC was using a 6-byte MODRM
format for normal memory references. If that were true it would have
some interest, but it is simply not true.

You really HAVE to look at specific x86 assembly language sequences to
see whether there is anything in this idea or not. Yes, there are some
machines on which the idea might play out effectively, but I am pretyt
convinced that the x86 is not one of them.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 12:28 Robert Dewar
  2003-01-06  5:04 ` Andy Walker
  2003-01-06 19:47 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 12:28 UTC (permalink / raw)
  To: gcc, ja_walker

> How do I tell gas to assemble an operand as a one-byte offset instead of a 
> four-byte offset?
> 
> e.g. 
> 
> MOV     eax,[ebp + 4]
> 
> Does it "just know?"
> 
> Andy

Yes, of course it "just knows", that's why your guess that gcc is generating
poor code seems ill-informed. It is inconceivable that *any* compiler would
use four byte offsets to access the local stack frame.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Sythetic registers: modrm/gas question.
@ 2003-01-05  6:08 Andy Walker
  2003-01-06 23:54 ` Mike Stump
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Walker @ 2003-01-05  6:08 UTC (permalink / raw)
  To: gcc

How do I tell gas to assemble an operand as a one-byte offset instead of a 
four-byte offset?

e.g. 

MOV	eax,[ebp + 4]

Does it "just know?"

Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-01-07  6:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-05 18:13 Sythetic registers: modrm/gas question Robert Dewar
2003-01-05 20:15 ` Michael S. Zick
  -- strict thread matches above, loose matches on Subject: below --
2003-01-07  3:13 Robert Dewar
2003-01-05 14:24 Robert Dewar
2003-01-05 16:56 ` Michael S. Zick
2003-01-05 17:37   ` Michael S. Zick
2003-01-05 22:33 ` Tom Lord
2003-01-05 13:25 Robert Dewar
2003-01-06  5:36 ` Andy Walker
2003-01-06 19:42 ` Tom Lord
2003-01-05 12:28 Robert Dewar
2003-01-06  5:04 ` Andy Walker
2003-01-06 19:47 ` Tom Lord
2003-01-05  6:08 Andy Walker
2003-01-06 23:54 ` Mike Stump
2003-01-07  6:15   ` Andy Walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).