From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10829 invoked by alias); 8 Aug 2006 02:59:26 -0000 Received: (qmail 10744 invoked by uid 48); 8 Aug 2006 02:59:17 -0000 Date: Tue, 08 Aug 2006 02:59:00 -0000 Message-ID: <20060808025917.10743.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "whaley at cs dot utsa dot edu" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2006-08/txt/msg00581.txt.bz2 List-Id: ------- Comment #45 from whaley at cs dot utsa dot edu 2006-08-08 02:59 ------- Guys, OK, with Dorit's -fdump-tree-vect-details, I made a little progress on vectorization. In order to get vectorization to work, I had to add the flag '-funsafe-math-optimizations'. I will try to create a tarfile with everything tomorrow so you guys can see all the output, but is it normal to need to throw this to get vectorization? SSE is IEEE compliant (unless you turn it off), and ATLAS needs to stay IEEE, so I can't turn on unsafe-math-opt in general . . . With these flags, gcc can vectorize the kernel if I do no unrolling at all. I have not yet run the full search on with these flags, but I've done quite a few hand-called cases, and the performance is lower than either the x87 (best) or scalar SSE for double on both the P4E and Ath64X2. For single precision, there is a modest speedup over the x87 code on both systems, but the total is *way* below my assembly SSE kernels. I just quickly glanced at the code, and I see that it never uses "movapd" from memory, which is a key to getting decent performance. ATLAS ensures that the input matrices (A & B) are 16-byte aligned. Is there any pragma/flag/etc I can set that says "pointer X points to data that is 16-byte aligned"? Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827