From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-147103-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2144 invoked by alias); 6 Jun 2008 15:01:30 -0000
Received: (qmail 29680 invoked by uid 22791); 6 Jun 2008 14:59:08 -0000
X-Spam-Check-By: sourceware.org
Received: from sunsite.ms.mff.cuni.cz (HELO sunsite.mff.cuni.cz) (195.113.15.26)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 06 Jun 2008 14:58:45 +0000
Received: from sunsite.mff.cuni.cz (localhost.localdomain [127.0.0.1]) 	by sunsite.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id m56FD7U0007991; 	Fri, 6 Jun 2008 17:13:07 +0200
Received: (from jakub@localhost) 	by sunsite.mff.cuni.cz (8.13.8/8.13.8/Submit) id m56FD6r2007990; 	Fri, 6 Jun 2008 17:13:06 +0200
Date: Fri, 06 Jun 2008 15:01:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: discuss@x86-64.org, GCC <gcc@gcc.gnu.org>,         "Girkar, Milind" <milind.girkar@intel.com>,         "Dmitriev, Serguei N" <serguei.n.dmitriev@intel.com>
Subject: Re: RFC: Extend x86-64 psABI for 256bit AVX register
Message-ID: <20080606151305.GV3726@sunsite.mff.cuni.cz>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com>
User-Agent: Mutt/1.4.2.2i
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2008-06/txt/msg00149.txt.bz2

On Thu, Jun 05, 2008 at 07:31:12AM -0700, H.J. Lu wrote:
> 1. Extend the register save area to put upper 128bit at the end.
>   Pros:
>     Aligned access.
>     Save stack space if 256bit registers are used.
>   Cons
>     Split access. Require more split access beyond 256bit.
> 
> 2. Extend the register save area to put full 265bit YMMs at the end.
> The first DWORD after the register save area has the offset of
> the extended array for YMM registers. The next DWORD has the
> element size of the extended array. Unaligned access will be used.
>   Pros:
>     No split access.
>     Easily extendable beyond 256bit.
>     Limited unaligned access penalty if stack is aligned at 32byte.
>   Cons:
>     May require store both the lower 128bit and full 256bit register
>     content. We may avoid saving the lower 128bit if correct type
>     is required when accessing variable argument list, similar to int
>     vs. double.
>     Waste 272 byte on stack when 256bit registers are used.
>     Unaligned load and store.

Or:

3. Pass unnamed __m256 arguments both in YMM registers and on the
stack or just on the stack.  How often do you think people pass
vectors to varargs functions?  I think I haven't seen that yet except
in gcc testcases.  The x86_64 float varargs setup prologue is already
quite slow now, do we want to make it even slower for something
very rarely used?  Although we have tree-stdarg optimization pass
which is able to optimize the varargs prologue setup code in some cases,
e.g. for printf etc. it can't help, as printf etc. just
does va_start, passes the va_list to another function and does va_end,
so it must count with any possibility.  Named __m256 arguments would
still be passed in YMM registers only...

	Jakub