From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27145 invoked by alias); 19 Mar 2007 17:53:26 -0000 Received: (qmail 26992 invoked by alias); 19 Mar 2007 17:52:21 -0000 Date: Mon, 19 Mar 2007 17:53:00 -0000 Message-ID: <20070319175221.26991.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "pinskia at gmail dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-03/txt/msg01849.txt.bz2 ------- Comment #6 from pinskia at gmail dot com 2007-03-19 17:52 ------- Subject: Re: pseudo-optimzation with sincos/cexpi On 19 Mar 2007 12:43:49 -0000, dominiq at lps dot ens dot fr wrote: > > Since sin() and cos() are non trivial functions, I am very surprised > that a wrong API makes a 50% difference. Well Here is how it can make a 50% difference (at least on the Cell, the 970 has less of a restriction and only the dispatch group is rejected). Modern PowerPC processors like not to store stuff to the stack and then load it again with in a number of cycles (cell is around 50 cycles while the 970 is just within a dispatch group). Transfering between the integer register set and the floating point register set can only be done via memory so you will get a LHS or a LRU reject (depending on what processor you are on). This can either cause a 50 cycle delay or reject of the dispatch group (the later can cause multiple rejects). The number of cycles used up by this issue can add up with both sides of the function having this hazard. Thanks, Andrew Pinski -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249