From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-268509-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4472 invoked by alias); 19 Jul 2010 16:23:37 -0000
Received: (qmail 4458 invoked by uid 22791); 19 Jul 2010 16:23:35 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL,BAYES_00,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from ksp.mff.cuni.cz (HELO atrey.karlin.mff.cuni.cz) (195.113.26.206)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 19 Jul 2010 16:23:30 +0000
Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018)	id 5C877F04E3; Mon, 19 Jul 2010 18:23:27 +0200 (CEST)
Date: Mon, 19 Jul 2010 16:23:00 -0000
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Henderson <rth@redhat.com>
Cc: Bernd Schmidt <bernds@codesourcery.com>,	"H.J. Lu" <hjl.tools@gmail.com>,	GCC Patches <gcc-patches@gcc.gnu.org>, ubizjak@gmail.com
Subject: Re: x86_64 varargs setup jump table
Message-ID: <20100719162327.GD17201@atrey.karlin.mff.cuni.cz>
References: <4C4035C3.9080305@codesourcery.com> <4C40A5BD.9080208@redhat.com> <4C40F005.3060507@codesourcery.com> <AANLkTimnDbQ0XEbp-CLFUwQiJTX9OwTN9YvfczoxsPsp@mail.gmail.com> <4C41BD52.5040905@codesourcery.com> <4C447222.7080500@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4C447222.7080500@redhat.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-07/txt/msg01505.txt.bz2

> On 07/17/2010 07:25 AM, Bernd Schmidt wrote:
> >  	leaq	0(,%rax,4), %rcx
> >  	movl	$.L2, %eax
> >  	subq	%rcx, %rax
> >  	jmp	*%rax
> 
> I've often thought this was over-engineering in the x86_64 abi.
> This jump table is trading memory bandwidth for unpredictability
> in the branch target.
> 
> I've often wondered if we'd get better performance if we changed
> to a simple comparison against zero.  I.e.
> 
> 	test	%al,%al
> 	jz	1f
> 	// 8 xmm stores
> 1:
> 
> H.J., do you think you'd be able to measure performance on this?

THe orginal problem was the fact that early K8 chips had no way of effectively
storing SSE register to memory whithout knowing its type.  So the stores in
prologue executed very slow when reformating happent.  Same reason was
for not having callee saved/restored SSE regs.

On current chips this is not big issue, so I do not care what way we output.
In fact I used to have patch for doing the jz but lost it.  I think we might
keep supporting both to get some checking that ABI is not terribly broken
(i.e. that no other copmilers just feeds rax with random value, but always
by number of args).

Honza
> 
> 
> 
> r~