From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25428 invoked by alias); 6 Jun 2008 14:28:41 -0000 Received: (qmail 25412 invoked by uid 22791); 6 Jun 2008 14:28:39 -0000 X-Spam-Check-By: sourceware.org Received: from py-out-1112.google.com (HELO py-out-1112.google.com) (64.233.166.179) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 06 Jun 2008 14:28:18 +0000 Received: by py-out-1112.google.com with SMTP id d37so667026pye.29 for ; Fri, 06 Jun 2008 07:28:17 -0700 (PDT) Received: by 10.115.89.1 with SMTP id r1mr156247wal.116.1212762496393; Fri, 06 Jun 2008 07:28:16 -0700 (PDT) Received: from lucon.org ( [99.150.211.105]) by mx.google.com with ESMTPS id n22sm7953576pof.3.2008.06.06.07.28.14 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 06 Jun 2008 07:28:14 -0700 (PDT) Received: by lucon.org (Postfix, from userid 500) id 5BED29801BD; Fri, 6 Jun 2008 07:28:13 -0700 (PDT) Date: Fri, 06 Jun 2008 14:28:00 -0000 To: Jan Hubicka Cc: Jan Hubicka , discuss@x86-64.org, GCC , "Girkar, Milind" , "Dmitriev, Serguei N" , "Kreitzer, David L" Subject: Re: RFC: Extend x86-64 psABI for 256bit AVX register Message-ID: <20080606142813.GA18621@lucon.org> References: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com> <20080605151511.GB24241@atrey.karlin.mff.cuni.cz> <6dc9ffc80806050914t76383385o380c0bb8ebc4e972@mail.gmail.com> <20080606082834.GC31743@kam.mff.cuni.cz> <20080606135026.GA14877@lucon.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080606135026.GA14877@lucon.org> User-Agent: Mutt/1.5.17 (2007-11-01) From: "H.J. Lu" X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2008-06/txt/msg00145.txt.bz2 On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote: > On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote: > > > > > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit > > > of xmm0. I am not sure if we need separate XMM registers from > > > YMM registers. > > > > > > Yes, I know that xmm0 is lower part of ymm0. I still think we ought to > > be able to support varargs that do save ymm0 registers only when ymm > > values are passed same way as we touch SSE only when SSE values are > > passed via EAX hint. > > Which register do you propose for hint? The current psABI uses RAX > for XMM registers. We can't change it to AL and AH for YMM without > breaking backward compatibility. > > > This way we will be able to support e.g. printf that has YMM printing % > > construct but don't need YMM enabled hardware when those are not used. > > > > This is why I think extending EAX to contain information about amount of > > XMM values to save and in addition YMM values to save is sane. Then old > > non-YMM aware varargs prologues will crash when YMM values are passed, > > but all other combinations will work. > > I don't think it is necessary since -mavx will enable AVX code > generation for all SSE codes. Unless the function only uses integer, > it will crash on non-YMM aware hardware. That is if there is one > SSE register is used, which is hinted in RAX, varargs prologue will > use AVX instructions to save it. We don't need another hint for AVX > instructions. > > > > > > > > > > > > I personally don't have much preferences over 1. or 2.. 1. seems > > > > relatively easy to implement too, or is packaging two 128bit values to > > > > single 256bit difficult in va_arg expansion? > > > > > > > > > > Access to 256bit register as lower and upper 128bits needs 2 > > > instructions. For store > > > > > > vmovaps %xmm7, -143(%rax) > > > vextractf128 $1, %ymm7, -15(%rax) > > > > > > For load > > > > > > vmovaps -143(%rax),%xmm7 > > > vinsert128 $1, -15(%rax),%ymm7,%ymm7 > > > > > > If we go beyond 256bit, we need more instructions to access > > > the full register. For 512bit, it will be split into lower 128bit, > > > middle 128bit and upper 256bit. 1024bit will have 4 parts. > > > > > > For #2, only one instruction will be needed for 256bit and > > > beyond. > > > > Yes, but we will still save half of stack space. Well, I don't have > > much preferences here. If it seems saner to simply save whole thing > > saving lower part twice, I am fine with that. > > I was told that it wasn't very easy to get decent performance with > split access. I extended my proposal to include a 16bit bitmask to > indicate which YMM regisetrs should be saved. If the bit is 0, > we should only save the the lower 128bit in the original register > save area. Otherwise, we should only save the same whole YMM register. > My second thought. How useful is such a bitmask? Do we really need it? Is that accepetable to save the lower 128bit twice? Thanks. H.J.