From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4472 invoked by alias); 19 Jul 2010 16:23:37 -0000 Received: (qmail 4458 invoked by uid 22791); 19 Jul 2010 16:23:35 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from ksp.mff.cuni.cz (HELO atrey.karlin.mff.cuni.cz) (195.113.26.206) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 19 Jul 2010 16:23:30 +0000 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018) id 5C877F04E3; Mon, 19 Jul 2010 18:23:27 +0200 (CEST) Date: Mon, 19 Jul 2010 16:23:00 -0000 From: Jan Hubicka To: Richard Henderson Cc: Bernd Schmidt , "H.J. Lu" , GCC Patches , ubizjak@gmail.com Subject: Re: x86_64 varargs setup jump table Message-ID: <20100719162327.GD17201@atrey.karlin.mff.cuni.cz> References: <4C4035C3.9080305@codesourcery.com> <4C40A5BD.9080208@redhat.com> <4C40F005.3060507@codesourcery.com> <4C41BD52.5040905@codesourcery.com> <4C447222.7080500@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C447222.7080500@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-07/txt/msg01505.txt.bz2 > On 07/17/2010 07:25 AM, Bernd Schmidt wrote: > > leaq 0(,%rax,4), %rcx > > movl $.L2, %eax > > subq %rcx, %rax > > jmp *%rax > > I've often thought this was over-engineering in the x86_64 abi. > This jump table is trading memory bandwidth for unpredictability > in the branch target. > > I've often wondered if we'd get better performance if we changed > to a simple comparison against zero. I.e. > > test %al,%al > jz 1f > // 8 xmm stores > 1: > > H.J., do you think you'd be able to measure performance on this? THe orginal problem was the fact that early K8 chips had no way of effectively storing SSE register to memory whithout knowing its type. So the stores in prologue executed very slow when reformating happent. Same reason was for not having callee saved/restored SSE regs. On current chips this is not big issue, so I do not care what way we output. In fact I used to have patch for doing the jz but lost it. I think we might keep supporting both to get some checking that ABI is not terribly broken (i.e. that no other copmilers just feeds rax with random value, but always by number of args). Honza > > > > r~