From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24222 invoked by alias); 7 Jan 2003 18:23:49 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 23731 invoked from network); 7 Jan 2003 18:22:29 -0000 Received: from unknown (HELO mail.kloo.net) (63.192.214.25) by 209.249.29.67 with SMTP; 7 Jan 2003 18:22:29 -0000 Received: by mail.kloo.net (Postfix, from userid 504) id 28BD63B0318; Tue, 7 Jan 2003 10:16:35 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by mail.kloo.net (Postfix) with ESMTP id 209FB3B4161; Tue, 7 Jan 2003 10:16:35 -0800 (PST) Date: Tue, 07 Jan 2003 19:20:00 -0000 From: To: Robert Dewar Cc: ja_walker@earthlink.net, lord@emf.net, mszick@goquest.com, gcc@gcc.gnu.org Subject: Re: An unusual Performance approach using Synthetic registers In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2003-01/txt/msg00362.txt.bz2 On Tue, 7 Jan 2003 tm_gccmail@mail.kloo.net wrote: > On Tue, 7 Jan 2003, Robert Dewar wrote: > > > > First, XCHG is what I think of as an Operating System instruction. It is > > > quite valuable because the exchange can be limited to a single process on a > > > single processor in a multiprocessor system, in conjunction with the locking > > > process. It is one of the very reliable ways to implement semaphores. > > > > Please look through the instruction set more carefully, this is NOT the way > > you would implement any sychronization instructions on the x86. > > > > Also, be very careful about timing of instructions when you start to look > > at the complex instructions of the x86. No one should even think of generating > > code for the x86 without reading the Intel guide for compiler writers. > > Basically the rule on most variants of the x86 is that you should treat > > it as a conventional load/store RISC machine when it comes to generating > > code. > > Yes, read-modify-write operations on memory locations is very bad, > especially on in-order execution members of the x86 family (Pentium and > below). The problem is that RMW operations stall the pipeline because the > modify/write cannot be performed until the load completes. > > So basically: > > 1) Load occurs > 2) Processor stalls for two clocks waiting for the load to occur > 3) Modify is done > 4) Write is done > > If the code is generated as for a strict load/store architecture, then > other instructions can be executed during the latency of the load > instruction to perform useful instructions. > > Toshi Now that I think about it, it's even worse on the Pentium/Pentium MMX than I initially thought. There's two instruction pipelines on the Pentium: the U pipe and the V pipe. The U pipe can execute all the instructions, but the V pipe can only execute simple instructions. IIRC, the RMW instruction would only execute in the U pipe. So not only would the processor stall for two clocks, there is about a chance of being delayed another clock because it needs to be issued in the U pipe. On the out-of-order execution members of the x86 family (Pentium II and above) I believe CISC-style instructions are deprecated anyway because they bottleneck the instruction decoder. The x86 port originally did generate RMW instructions on memory up until about four or five years ago, when I noticed that it's deprecated in the Intel Compiler Writer's guide. I mentioned this on gcc or gcc-bugs, and rth installed some patches to prevent RMW instructions from being generated - if my recollection is correct. I think maybe possibly we may want to generate them if -Os is specified, since they probably reduce code size. However, they definitely shouldn't be generated under normal circumstances. Toshi