Re: Sythetic registers: modrm/gas question.

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 18:13 Robert Dewar
  2003-01-05 20:15 ` Michael S. Zick
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 18:13 UTC (permalink / raw)
  To: dewar, lord, mszick; +Cc: gcc, ja_walker

> I have to strongly disagree with that observation.

I am talking very specifically here of the issue of taking register
renaming into account in register allocation and scheduling algorithms.
I certainly would be interested in any references you know of in this
area.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 18:13 Sythetic registers: modrm/gas question Robert Dewar
@ 2003-01-05 20:15 ` Michael S. Zick
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Zick @ 2003-01-05 20:15 UTC (permalink / raw)
  To: Robert Dewar, dewar, lord, mszick; +Cc: gcc, ja_walker

On Sunday 05 January 2003 11:59 am, Robert Dewar wrote:
> > I have to strongly disagree with that observation.
>
> I am talking very specifically here of the issue of taking register
> renaming into account in register allocation and scheduling algorithms.
> I certainly would be interested in any references you know of in this
> area.
My error, I mis-read the statement.
My mind was somewhere else while my hands were typing.

This entire thread has spured me to dig out my project notes from
a globally optimizing, meta-assembler done about 23 years ago.

I am doing a re-write of the principles and concepts with GCC
in mind. I'll let the list know when I have the draft finished.

If GCC really isn't doing better than a W.A.F.G. in this area,
perhaps it will help.

Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
@ 2003-01-07  3:13 Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-07  3:13 UTC (permalink / raw)
  To: ja_walker, mstump; +Cc: gcc

> You will want to learn how to ask GAS these questions.  Hint, type in 
> the assembly, run as foo.s, and then objdump (maybe -d) the result.  
> This is faster and generally more accurate than asking us.

Actually, I disagree with this. An experiment like this can only tell you
that in a particular situation, GAS does a particular thing. An experiment
cannot tell you the general rules, and the generated code must rely on
these rules, which need to be clearly stated. In fact I don't know whether
there are clearly stated rules for GAS, though in this particular case it
is obvious that GAS must minimize offsets, since if it did not, that would
have the status of being a clear bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 14:24 Robert Dewar
  2003-01-05 16:56 ` Michael S. Zick
  2003-01-05 22:33 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 14:24 UTC (permalink / raw)
  To: dewar, lord; +Cc: gcc, ja_walker

> Synthregs improve locality at a slight cost in code size.  I agree
> with ja_walker that the worthiness of that trade-off is an emprical
> question worth measuring.  I'd go beyond him by saying that in the 
> medium term future, the trade-off almost certainly wins often.

I disagree. In practice references to the local stack frame are nearly
always in cache. So there is just nothing much to improve. Actually it
is instruction cache that causes problems more often than data cache,
and making code larger is almost always a loss (that's way for example
-O3 frequently slows things down compared with -O2, and in the case of
Ada, where it is more normal to explicitly control inlining, -O3 is
almost always worse).

> x86 is a "rapidly" exploding range of physical architectures.

True, but the basic observation above holds for all of them, from the
486 onwards.

> "Yeah, well, until you've done the same you've no
> business talking about the SR proposal"

Gosh, you took the words out of my mouth :-) :-)

Seriously, I do think this needs to be discussed at a detailed level, which
is why I asked for an example. If we look at a specific example, then we
can fill in any details that are needed for the example, and not rely on
everyone having detailed knowledge of the architecture.

By the way, to repeat an idea that I think is definitely worth following up
on, register renaming plays a very important part in these architectures.
The Pentium-4 has something like 40 registers (can't remember exact number
and can't be bothered to look it up :-) Only 8 of these are directly 
nameable, but in a real program, many of these can get used.

I think it would be really interesting to study the issue of taking the
renaming into account when allocating registers. Consider

   mov ax, mem1
   add ax, 1
   mov mem1, ax
   mov ax, mem2
   add ax, 2
   mov mem2, ax

Classical optimization suggests

   mov ax, mem1
   mov bx, mem2
   add ax, 1
   add bx, 1
   mov mem1, ax
   mov mem2, bx

so that the two operations can be done in parallel in separate pipelines,
but in fact the first sequence is better, since register renaming will
allow the use of two separate pipelines, and you don't use up another
nameable register, which are what are in very short supply. 

The above is just an illustrative example of the kind of thing I am
talking about here. The devil is in the details (as with the SR proposal
itself), and you really have to know EXACTLY how the renaming works to
make sure you generate code that cooperates with it.

I am trying to interest a PhD student to do research in this area :-)

There is some work done, but not that much.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 14:24 Robert Dewar
@ 2003-01-05 16:56 ` Michael S. Zick
  2003-01-05 17:37   ` Michael S. Zick
  2003-01-05 22:33 ` Tom Lord
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Zick @ 2003-01-05 16:56 UTC (permalink / raw)
  To: Robert Dewar, dewar, lord; +Cc: gcc, ja_walker

On Sunday 05 January 2003 08:17 am, Robert Dewar wrote:
> > Synthregs improve locality at a slight cost in code size.  I agree
> > with ja_walker that the worthiness of that trade-off is an emprical
> > question worth measuring.  I'd go beyond him by saying that in the
> > medium term future, the trade-off almost certainly wins often.
>
> I disagree. In practice references to the local stack frame are nearly
> always in cache. So there is just nothing much to improve. Actually it
> is instruction cache that causes problems more often than data cache,
> and making code larger is almost always a loss (that's way for example
> -O3 frequently slows things down compared with -O2, and in the case of
> Ada, where it is more normal to explicitly control inlining, -O3 is
> almost always worse).
>
> > x86 is a "rapidly" exploding range of physical architectures.
>
> True, but the basic observation above holds for all of them, from the
> 486 onwards.
>
> > "Yeah, well, until you've done the same you've no
> > business talking about the SR proposal"
>
> Gosh, you took the words out of my mouth :-) :-)
>
> Seriously, I do think this needs to be discussed at a detailed level, which
> is why I asked for an example. If we look at a specific example, then we
> can fill in any details that are needed for the example, and not rely on
> everyone having detailed knowledge of the architecture.
>
> By the way, to repeat an idea that I think is definitely worth following up
> on, register renaming plays a very important part in these architectures.
> The Pentium-4 has something like 40 registers (can't remember exact number
> and can't be bothered to look it up :-) Only 8 of these are directly
> nameable, but in a real program, many of these can get used.
>
> I think it would be really interesting to study the issue of taking the
> renaming into account when allocating registers. Consider
>
>    mov ax, mem1
>    add ax, 1
>    mov mem1, ax
>    mov ax, mem2
>    add ax, 2
>    mov mem2, ax
>
> Classical optimization suggests
>
>    mov ax, mem1
>    mov bx, mem2
>    add ax, 1
>    add bx, 1
>    mov mem1, ax
>    mov mem2, bx
>
> so that the two operations can be done in parallel in separate pipelines,
> but in fact the first sequence is better, since register renaming will
> allow the use of two separate pipelines, and you don't use up another
> nameable register, which are what are in very short supply.
>
For certain sure...
Trying to be too smart in the instruction scheduling area will only
result in the compiler designer "shooting himself in the foot" with
some of the modern day processors.

I.E: Using a lot of compiler time, to generate a lot of code, with the
only real effect being defeating the effectiveness of the CPU's
renaming mechanisms.

> The above is just an illustrative example of the kind of thing I am
> talking about here. The devil is in the details (as with the SR proposal
> itself), and you really have to know EXACTLY how the renaming works to
> make sure you generate code that cooperates with it.
>
> I am trying to interest a PhD student to do research in this area :-)
>
Another area for improvement:
Doing reverential pattern, frequency analysis of data over program
flow and then, with knowledge of the data cache size, mapping and
replacement algorithm order and pack the data allocations with
the goal that they are always "cache hot".

>
> There is some work done, but not that much.
I have to strongly disagree with that observation.

Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 16:56 ` Michael S. Zick
@ 2003-01-05 17:37   ` Michael S. Zick
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Zick @ 2003-01-05 17:37 UTC (permalink / raw)
  To: Robert Dewar, dewar, lord; +Cc: gcc, ja_walker

On Sunday 05 January 2003 10:45 am, Michael S. Zick wrote:
>
> Another area for improvement:
> Doing reverential pattern, frequency analysis of data over program
- - - - ^ ^ ^ ^ ^
referential

My proof reader is still on vacation.

Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 14:24 Robert Dewar
  2003-01-05 16:56 ` Michael S. Zick
@ 2003-01-05 22:33 ` Tom Lord
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Lord @ 2003-01-05 22:33 UTC (permalink / raw)
  To: dewar; +Cc: dewar, gcc, ja_walker

       I disagree. In practice references to the local stack frame are
       nearly always in cache.

Yeah, I see what you're saying (I think):

Blindly tricking the register allocator isn't _quite_ right.  It has
to know not to bother storing locals and args in synthregs, and not to
bother moving a spilled value to a synthreg.

The thing I like about synthregs is the idea of moving some values not
in those classes into locations that will be cache-favored.  Just
tricking the reg allocator seemed initially like an easy way to do it,
but it's (hopefully only slightly) more complicated than that.

    > "Yeah, well, until you've done the same you've no
    > business talking about the SR proposal"

    Gosh, you took the words out of my mouth :-) :-)

It's hard to articulate.  I believe there's a perspective on
architecture that transcends particular machines and that tends to
predict the future pretty well.  It's sort of like you look at the
logical dependencies among various parts of the state of the abstract
machine -- and those logical dependencies tell you a lot about how
machines can be implemented and optimized (they are computations that
have to be physically realized) -- and real machines tend sharply,
over time, to take advantage of those optimizations.  It's hard to
articulate.

It will take me a little while to digest the rest of your post.

-t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 13:25 Robert Dewar
  2003-01-06  5:36 ` Andy Walker
  2003-01-06 19:42 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 13:25 UTC (permalink / raw)
  To: dewar, lord; +Cc: gcc, ja_walker

> The question since it could almost certainly have been answered with a
> five-minute experiment (what, you don't have the necessary platform
> around?)

It's a little more fundamental than that. It is central to the design
of the x86 that the variable length offsets are optimized by the assembler.

> the synthregs proposal is argued for 
> at a much higher level than would require that close an examination
> of GCC's generated code

And that's the problem, It is being argued at too high a level, and creates
the impression of some principle that in fact I do not believe will show
up as improved code. You *DO* need to examine GCC's generated code to realize
that the code we generate today is essentially equivalent to the idea of
SR's. 

> Or are you good at simulating complex caches in your head based
> on disassembly listings and demand that we all are as well?

Well yes I am pretty good at it, but I definitely do NOT like to depend
on doing this, and I do NOT expect others to do it. What is needed are
some real examples so that we are NOT doing things in our head.

I certainly apologize if my answer seemed rude, but it was an attempt to
try to dig down into the details, since that is where the substance of
the argument will play out in a useful manner.

Actually I think the idea that the issue is in any sense related to the
operation of complex caches is completely bogus. In practice in typical
x86 code, nearly all EBP references with small offsets (references to
arguments or locals in the current stack frame) are in L1 cache. All our
data shows that, so if the idea of the SR proposal is somehow to improve
cache performance, that is unlikely to work out in practice. GCC already
does a pretty aggressive job of moving most references to the local
stack (either by referencing variables there, or spill locations).

Remember that the code you get for accessing a synthetic register is 
identical *in all respects* to the code you get for accessing local
variables and arguments. 

The question here seemed to imply that the fundamental idea behind the
SR proposal was to take advantage of the 3-byte MODRM format for efficient
access to SR's, coupled with a belief that GCC was using a 6-byte MODRM
format for normal memory references. If that were true it would have
some interest, but it is simply not true.

You really HAVE to look at specific x86 assembly language sequences to
see whether there is anything in this idea or not. Yes, there are some
machines on which the idea might play out effectively, but I am pretyt
convinced that the x86 is not one of them.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 13:25 Robert Dewar
@ 2003-01-06  5:36 ` Andy Walker
  2003-01-06 19:42 ` Tom Lord
  1 sibling, 0 replies; 16+ messages in thread
From: Andy Walker @ 2003-01-06  5:36 UTC (permalink / raw)
  To: Robert Dewar, lord; +Cc: gcc

On Sunday 05 January 2003 07:12 am, Robert Dewar wrote:
<snip>
> Actually I think the idea that the issue is in any sense related to the
> operation of complex caches is completely bogus. 

Bogus to you, speculative to me.

> In practice in typical
> x86 code, nearly all EBP references with small offsets (references to
> arguments or locals in the current stack frame) are in L1 cache. All our
> data shows that, 

I didn't know that.  Again, thank you for the information.

> so if the idea of the SR proposal is somehow to improve
> cache performance, that is unlikely to work out in practice. 
<snip>
> You really HAVE to look at specific x86 assembly language sequences to
> see whether there is anything in this idea or not. 

I couldn't agree more.  I will finish modifying gcc and we will try it and 
see.

Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 13:25 Robert Dewar
  2003-01-06  5:36 ` Andy Walker
@ 2003-01-06 19:42 ` Tom Lord
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Lord @ 2003-01-06 19:42 UTC (permalink / raw)
  To: dewar; +Cc: dewar, gcc, ja_walker

       It's a little more fundamental than that. It is central to the
       design of the x86 that the variable length offsets are
       optimized by the assembler.

Well, duh .... but synthregs are not about that.  It was a separate
question (unless ja_walker is a right-twice-a-day broken clock here).

        And that's the problem, It is being argued at too high a
        level, and creates the impression of some principle that in
        fact I do not believe will show up as improved code. You *DO*
        need to examine GCC's generated code to realize that the code
        we generate today is essentially equivalent to the idea of
        SR's.

It's _not_ equivalent.  Looking at just the instruction sequences and
ignoring cache considerations -- synthregs are almost certainly
slightly worse.

Synthregs improve locality at a slight cost in code size.  I agree
with ja_walker that the worthiness of that trade-off is an emprical
question worth measuring.  I'd go beyond him by saying that in the 
medium term future, the trade-off almost certainly wins often.

       Remember that the code you get for accessing a synthetic
       register is identical *in all respects* to the code you get for
       accessing local variables and arguments.

Great.  And synthregs can put _more_ of the values in a computation
into that access class.   

     The [stupid gas] question here seemed to imply that the
     fundamental idea behind the SR proposal was to take advantage of
     the 3-byte MODRM format for efficient access to SR's, coupled
     with a belief that GCC was using a 6-byte MODRM format for normal
     memory references. If that were true it would have some interest,
     but it is simply not true.

Nah, it doesn't imply that at all.  If _that_ were ja_walker's
concern, then instead of a synthreg proposal, he'd have made a
proposal to "fix gas" (making it what it already is).  Best available
evidence is that he isn't that clueless.

     You really HAVE to look at specific x86 assembly language
     sequences to see whether there is anything in this idea or
     not. Yes, there are some machines on which the idea might play
     out effectively, but I am pretyt convinced that the x86 is not
     one of them.

x86 is a "rapidly" exploding range of physical architectures.  I think
your statement is too sweeping, but I do get the impression you've
studied a subset of those physical architectures in excruciating
detail (hat's off).  (Please don't miss the point of the compliment
and reply that "Yeah, well, until you've done the same you've no
business talking about the SR proposal" --- that'll get (even more)
tiresome real quick, I promise. ;-)

-t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
@ 2003-01-05 12:28 Robert Dewar
  2003-01-06  5:04 ` Andy Walker
  2003-01-06 19:47 ` Tom Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 2003-01-05 12:28 UTC (permalink / raw)
  To: gcc, ja_walker

> How do I tell gas to assemble an operand as a one-byte offset instead of a 
> four-byte offset?
> 
> e.g. 
> 
> MOV     eax,[ebp + 4]
> 
> Does it "just know?"
> 
> Andy

Yes, of course it "just knows", that's why your guess that gcc is generating
poor code seems ill-informed. It is inconceivable that *any* compiler would
use four byte offsets to access the local stack frame.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 12:28 Robert Dewar
@ 2003-01-06  5:04 ` Andy Walker
  2003-01-06 19:47 ` Tom Lord
  1 sibling, 0 replies; 16+ messages in thread
From: Andy Walker @ 2003-01-06  5:04 UTC (permalink / raw)
  To: Robert Dewar, gcc

On Sunday 05 January 2003 05:42 am, Robert Dewar wrote:
<snip>
> Yes, of course it "just knows", that's why your guess that gcc is
> generating poor code seems ill-informed. It is inconceivable that *any*
> compiler would use four byte offsets to access the local stack frame.

Thank you for the answer.

My question here was not about compilers.  It was about assembler syntax.  I 
have reached the point where I am changing the machine description to handle 
Synthetic registers.  NASM requires a "byte" modifier in this instance, and 
it has been so long since I used the Borland Turbo Assembler on a 486, that I 
thought it prudent to check.  ( There does not seem to be a manual for gas).

Andy.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05 12:28 Robert Dewar
  2003-01-06  5:04 ` Andy Walker
@ 2003-01-06 19:47 ` Tom Lord
  1 sibling, 0 replies; 16+ messages in thread
From: Tom Lord @ 2003-01-06 19:47 UTC (permalink / raw)
  To: dewar; +Cc: gcc, ja_walker

       > How do I tell gas to assemble an operand as a one-byte offset
       > instead of a four-byte offset?
       > 
       > e.g. 
       > 
       > MOV     eax,[ebp + 4]
       > 
       > Does it "just know?"

       Yes, of course it "just knows", that's why your guess that gcc
       is generating poor code seems ill-informed. It is inconceivable
       that *any* compiler would use four byte offsets to access the
       local stack frame.

_That's_ rude.

The question since it could almost certainly have been answered with a
five-minute experiment (what, you don't have the necessary platform
around?)

The caustic reply (beginning at "that's why...") because it is
illogical and prejudicial:  the synthregs proposal is argued for 
at a much higher level than would require that close an examination
of GCC's generated code -- and by-eye examination of that code 
doesn't at all easily help us decide if/when synthregs is a win.
Or are you good at simulating complex caches in your head based
on disassembly listings and demand that we all are as well?

-t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Sythetic registers: modrm/gas question.
@ 2003-01-05  6:08 Andy Walker
  2003-01-06 23:54 ` Mike Stump
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Walker @ 2003-01-05  6:08 UTC (permalink / raw)
  To: gcc

How do I tell gas to assemble an operand as a one-byte offset instead of a 
four-byte offset?

e.g. 

MOV	eax,[ebp + 4]

Does it "just know?"

Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-05  6:08 Andy Walker
@ 2003-01-06 23:54 ` Mike Stump
  2003-01-07  6:15   ` Andy Walker
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Stump @ 2003-01-06 23:54 UTC (permalink / raw)
  To: Andy Walker; +Cc: gcc

On Saturday, January 4, 2003, at 09:55 PM, Andy Walker wrote:
> How do I tell gas to assemble an operand as a one-byte offset instead 
> of a
> four-byte offset?

You will want to learn how to ask GAS these questions.  Hint, type in 
the assembly, run as foo.s, and then objdump (maybe -d) the result.  
This is faster and generally more accurate than asking us.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Sythetic registers: modrm/gas question.
  2003-01-06 23:54 ` Mike Stump
@ 2003-01-07  6:15   ` Andy Walker
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Walker @ 2003-01-07  6:15 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

On Monday 06 January 2003 05:48 pm, Mike Stump wrote:
> On Saturday, January 4, 2003, at 09:55 PM, Andy Walker wrote:
> > How do I tell gas to assemble an operand as a one-byte offset instead
> > of a
> > four-byte offset?
>
> You will want to learn how to ask GAS these questions.  Hint, type in
> the assembly, run as foo.s, and then objdump (maybe -d) the result.
> This is faster and generally more accurate than asking us.

Nice hint.  Thank you.

Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-01-07  6:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-05 18:13 Sythetic registers: modrm/gas question Robert Dewar
2003-01-05 20:15 ` Michael S. Zick
  -- strict thread matches above, loose matches on Subject: below --
2003-01-07  3:13 Robert Dewar
2003-01-05 14:24 Robert Dewar
2003-01-05 16:56 ` Michael S. Zick
2003-01-05 17:37   ` Michael S. Zick
2003-01-05 22:33 ` Tom Lord
2003-01-05 13:25 Robert Dewar
2003-01-06  5:36 ` Andy Walker
2003-01-06 19:42 ` Tom Lord
2003-01-05 12:28 Robert Dewar
2003-01-06  5:04 ` Andy Walker
2003-01-06 19:47 ` Tom Lord
2003-01-05  6:08 Andy Walker
2003-01-06 23:54 ` Mike Stump
2003-01-07  6:15   ` Andy Walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).