From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18033 invoked by alias); 14 Mar 2008 01:10:11 -0000 Received: (qmail 18024 invoked by uid 22791); 14 Mar 2008 01:10:10 -0000 X-Spam-Check-By: sourceware.org Received: from wf-out-1314.google.com (HELO wf-out-1314.google.com) (209.85.200.175) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Mar 2008 01:09:53 +0000 Received: by wf-out-1314.google.com with SMTP id 28so3992642wfc.14 for ; Thu, 13 Mar 2008 18:09:51 -0700 (PDT) Received: by 10.142.237.20 with SMTP id k20mr4716363wfh.174.1205456991585; Thu, 13 Mar 2008 18:09:51 -0700 (PDT) Received: by 10.142.82.10 with HTTP; Thu, 13 Mar 2008 18:09:51 -0700 (PDT) Message-ID: Date: Fri, 14 Mar 2008 01:10:00 -0000 From: "Clem Taylor" To: gcc-help@gcc.gnu.org Subject: Re: workaround for "error: more than 30 operands in 'asm'"? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2008-03/txt/msg00126.txt.bz2 On Thu, Mar 13, 2008 at 1:56 AM, Ian Lance Taylor wrote: > Since you mention the number of registers you are using, note that > that only matters if they are inputs or outputs. If you need a > temporary register, just pick one, and add it the clobber list. But > if you really have that many inputs and outputs, then you are stuck. I'm using input and outputs because I want the compiler to pick the registers and I want to have named values. The inline block looks something like: asm ( "... bunch-o-vmx code ..." : [rIn0] "=rv" (rIn0), [gIn0] "=rv" (gIn0), [bIn0] "=rv" (bIn0), ... : [rpix] "r" (rpix), [gpix] "r" (gpix), [bpix] "r" (bpix), ... : "memory" ); Writing this type of code using %0 %1 ... %n would be very painful and unpleasant to maintain. If gcc 4.2.x did a sane jobs scheduling the C intrinsic version of this code I wouldn't need to use inline assembly. I wrote the C version in the order that shouldn't have any stalls. But the compiler re-orders the code and takes offset constants and recomputes them inside the loop [values like 0, 16, 32, ... 112]. With all the write/read stalls and extra addi instructions, the C intrinsic version runs at >5 cycles per instruction and overall the asm version is ~10x faster, ouch. --Clem