From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4916 invoked by alias); 15 Mar 2008 00:10:33 -0000 Received: (qmail 4894 invoked by uid 22791); 15 Mar 2008 00:10:20 -0000 X-Spam-Check-By: sourceware.org Received: from thermalthree.footholds.net (HELO thermalthree.footholds.net) (195.62.28.240) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sat, 15 Mar 2008 00:09:58 +0000 Received: from [78.150.20.175] (helo=mercury.local) by thermalthree.footholds.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (envelope-from ) id 1JaJyE-0000VX-F7 for gcc-help@gcc.gnu.org; Sat, 15 Mar 2008 00:09:54 +0000 Date: Sat, 15 Mar 2008 00:10:00 -0000 From: Maximillian Murphy To: gcc-help@gcc.gnu.org Subject: Re: is -O2 breaking sse2 alignment? Message-Id: <20080315072913.a192fdb0.m@de-minimis.co.uk> In-Reply-To: <47D9C2EF.9070004@gmail.com> References: <47D8679C.4090606@gmail.com> <47D8751F.722F7196@dessent.net> <47D9C2EF.9070004@gmail.com> X-Mailer: Sylpheed version 2.3.0beta5 (GTK+ 2.8.20; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2008-03/txt/msg00141.txt.bz2 > > What I am _really_ trying to do is to implement is the addition of > elements of two arrays. > > Is there a more efficient way of doing this than this way?: > Question from someone who has just written his first few lines of SSE2 (oh how exciting, but let's not get too excited until we can actually beat the SSE-free standard compile!): How many SSE2 instructions can be run at the same time? I would have thought that if there is much optimising to be done it will be in loading up all the registers and doing lots of SSE instructions in parallel. Presumably the challenge will be organising traffic to and from the registers so that we don't get spikes from loading registers simultaneously. Rather we'd have to load one pair of registers whilst simultaneously adding together another pair whilst simultaneously writing out the result of a third. That kind of thing. Am I on the right track or am I way off the mark? Regards, Max.