From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2144 invoked by alias); 6 Jun 2008 15:01:30 -0000 Received: (qmail 29680 invoked by uid 22791); 6 Jun 2008 14:59:08 -0000 X-Spam-Check-By: sourceware.org Received: from sunsite.ms.mff.cuni.cz (HELO sunsite.mff.cuni.cz) (195.113.15.26) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 06 Jun 2008 14:58:45 +0000 Received: from sunsite.mff.cuni.cz (localhost.localdomain [127.0.0.1]) by sunsite.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id m56FD7U0007991; Fri, 6 Jun 2008 17:13:07 +0200 Received: (from jakub@localhost) by sunsite.mff.cuni.cz (8.13.8/8.13.8/Submit) id m56FD6r2007990; Fri, 6 Jun 2008 17:13:06 +0200 Date: Fri, 06 Jun 2008 15:01:00 -0000 From: Jakub Jelinek To: "H.J. Lu" Cc: discuss@x86-64.org, GCC , "Girkar, Milind" , "Dmitriev, Serguei N" Subject: Re: RFC: Extend x86-64 psABI for 256bit AVX register Message-ID: <20080606151305.GV3726@sunsite.mff.cuni.cz> Reply-To: Jakub Jelinek References: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com> User-Agent: Mutt/1.4.2.2i Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2008-06/txt/msg00149.txt.bz2 On Thu, Jun 05, 2008 at 07:31:12AM -0700, H.J. Lu wrote: > 1. Extend the register save area to put upper 128bit at the end. > Pros: > Aligned access. > Save stack space if 256bit registers are used. > Cons > Split access. Require more split access beyond 256bit. > > 2. Extend the register save area to put full 265bit YMMs at the end. > The first DWORD after the register save area has the offset of > the extended array for YMM registers. The next DWORD has the > element size of the extended array. Unaligned access will be used. > Pros: > No split access. > Easily extendable beyond 256bit. > Limited unaligned access penalty if stack is aligned at 32byte. > Cons: > May require store both the lower 128bit and full 256bit register > content. We may avoid saving the lower 128bit if correct type > is required when accessing variable argument list, similar to int > vs. double. > Waste 272 byte on stack when 256bit registers are used. > Unaligned load and store. Or: 3. Pass unnamed __m256 arguments both in YMM registers and on the stack or just on the stack. How often do you think people pass vectors to varargs functions? I think I haven't seen that yet except in gcc testcases. The x86_64 float varargs setup prologue is already quite slow now, do we want to make it even slower for something very rarely used? Although we have tree-stdarg optimization pass which is able to optimize the varargs prologue setup code in some cases, e.g. for printf etc. it can't help, as printf etc. just does va_start, passes the va_list to another function and does va_end, so it must count with any possibility. Named __m256 arguments would still be passed in YMM registers only... Jakub