From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-65881-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24222 invoked by alias); 7 Jan 2003 18:23:49 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 23731 invoked from network); 7 Jan 2003 18:22:29 -0000
Received: from unknown (HELO mail.kloo.net) (63.192.214.25)
  by 209.249.29.67 with SMTP; 7 Jan 2003 18:22:29 -0000
Received: by mail.kloo.net (Postfix, from userid 504)
	id 28BD63B0318; Tue,  7 Jan 2003 10:16:35 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by mail.kloo.net (Postfix) with ESMTP
	id 209FB3B4161; Tue,  7 Jan 2003 10:16:35 -0800 (PST)
Date: Tue, 07 Jan 2003 19:20:00 -0000
From: <tm_gccmail@mail.kloo.net>
To: Robert Dewar <dewar@gnat.com>
Cc: ja_walker@earthlink.net, lord@emf.net, mszick@goquest.com,
	gcc@gcc.gnu.org
Subject: Re: An unusual Performance approach using Synthetic registers
In-Reply-To: <Pine.LNX.4.21.0301070955310.18602-100000@mail.kloo.net>
Message-ID: <Pine.LNX.4.21.0301071008500.18627-100000@mail.kloo.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2003-01/txt/msg00362.txt.bz2

On Tue, 7 Jan 2003 tm_gccmail@mail.kloo.net wrote:

> On Tue, 7 Jan 2003, Robert Dewar wrote:
> 
> > > First, XCHG is what I think of as an Operating System instruction.  It is 
> > > quite valuable because the exchange can be limited to a single process on a 
> > > single processor in a multiprocessor system, in conjunction with the locking 
> > > process.  It is one of the very reliable ways to implement semaphores.  
> > 
> > Please look through the instruction set more carefully, this is NOT the way
> > you would implement any sychronization instructions on the x86.
> > 
> > Also, be very careful about timing of instructions when you start to look
> > at the complex instructions of the x86. No one should even think of generating
> > code for the x86 without reading the Intel guide for compiler writers. 
> > Basically the rule on most variants of the x86 is that you should treat
> > it as a conventional load/store RISC machine when it comes to generating
> > code.
> 
> Yes, read-modify-write operations on memory locations is very bad,
> especially on in-order execution members of the x86 family (Pentium and
> below). The problem is that RMW operations stall the pipeline because the
> modify/write cannot be performed until the load completes.
> 
> So basically:
> 
> 1) Load occurs
> 2) Processor stalls for two clocks waiting for the load to occur
> 3) Modify is done
> 4) Write is done
> 
> If the code is generated as for a strict load/store architecture, then
> other instructions can be executed during the latency of the load
> instruction to perform useful instructions.
> 
> Toshi

Now that I think about it, it's even worse on the Pentium/Pentium MMX than
I initially thought.

There's two instruction pipelines on the Pentium: the U pipe and the V
pipe. The U pipe can execute all the instructions, but the V pipe can only
execute simple instructions.

IIRC, the RMW instruction would only execute in the U pipe. So not only
would the processor stall for two clocks, there is about a chance of being
delayed another clock because it needs to be issued in the U pipe.

On the out-of-order execution members of the x86 family (Pentium II and
above) I believe CISC-style instructions are deprecated anyway because
they bottleneck the instruction decoder.

The x86 port originally did generate RMW instructions on memory up until
about four or five years ago, when I noticed that it's deprecated in the
Intel Compiler Writer's guide. I mentioned this on gcc or gcc-bugs, and
rth installed some patches to prevent RMW instructions from being
generated - if my recollection is correct.

I think maybe possibly we may want to generate them if -Os is specified,
since they probably reduce code size. However, they definitely shouldn't
be generated under normal circumstances.

Toshi