From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-247313-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 527 invoked by alias); 24 Aug 2009 16:00:49 -0000
Received: (qmail 509 invoked by uid 22791); 24 Aug 2009 16:00:46 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 	tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from ey-out-1920.google.com (HELO ey-out-1920.google.com) (74.125.78.147)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 24 Aug 2009 16:00:35 +0000
Received: by ey-out-1920.google.com with SMTP id 13so512752eye.14         for <gcc-patches@gcc.gnu.org>; Mon, 24 Aug 2009 09:00:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.216.85.133 with SMTP id u5mr1007493wee.91.1251129632935; Mon,  	24 Aug 2009 09:00:32 -0700 (PDT)
In-Reply-To: <6dc9ffc80908071530x7d4a3965u8021df66a142a0bf@mail.gmail.com>
References: <Pine.LNX.4.64.0908070254030.30134@artax.karlin.mff.cuni.cz> 	 <20090807071305.GX4462@tyan-ft48-01.lab.bos.redhat.com> 	 <6dc9ffc80908070553q6f9b1b78lc19e6e4a4a5ec73b@mail.gmail.com> 	 <6dc9ffc80908071530x7d4a3965u8021df66a142a0bf@mail.gmail.com>
Date: Mon, 24 Aug 2009 17:39:00 -0000
Message-ID: <6dc9ffc80908240900l73d3c97fo2c31fbd0142e75d2@mail.gmail.com>
Subject: Re: PATCH: PR target/40838: gcc shouldn't assume that the stack is  	aligned
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>, gcc-patches@gcc.gnu.org,  	ubizjak@gmail.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2009-08/txt/msg01303.txt.bz2

On Fri, Aug 7, 2009 at 3:30 PM, H.J. Lu<hjl.tools@gmail.com> wrote:
> On Fri, Aug 7, 2009 at 5:53 AM, H.J. Lu<hjl.tools@gmail.com> wrote:
>> On Fri, Aug 7, 2009 at 12:13 AM, Jakub Jelinek<jakub@redhat.com> wrote:
>>> On Fri, Aug 07, 2009 at 02:54:46AM +0200, Mikulas Patocka wrote:
>>>> > > In 32bit, the incoming stack may not be 16 byte aligned. =A0This p=
atch
>>>> > > assumes the incoming stack is 4 byte aligned and realigns stack if=
 any
>>>> > > SSE variable is put on stack. Any comments?
>>>> >
>>>> > IMHO this is wrong, I could live with a non-default option for those=
 who
>>>> > don't care about performance and think a SCO document from 1996 has =
any
>>>> > relevance to Linux these days. =A0In reality a Linux ABI for years a=
ssumes
>>>> > 16 byte stack alignment for 32-bit code.
>>>>
>>>> Tell me which Linux distribution did you run with 16-byte stack alignm=
ent
>>>> checking (as proposed in bug 40838) and what was the result?
>>>>
>>>> For me, the result was that 75% of binaries in /bin in Debian Lenny do=
 not
>>>> align the stack on 16-byte boundary.
>>>
>>> Besides the obstack glibc bug which has been fixed since then you haven=
't
>>> reported anything particular. =A0It is true that parts of i?86 glibc is
>>> compiled with -mpreferered-stack-boundary=3D2, but only parts that don'=
t call
>>> callbacks. =A0Async signals AFAIK will align the stack properly.
>>>
>>> I simply don't trust your 75% claim, lots of stuff would break if things
>>> weren't aligned properly.
>>>
>>
>> From gcc 3.4:
>>
>> =A0/* Validate -mpreferred-stack-boundary=3D value, or provide default.
>> =A0 =A0 The default of 128 bits is for Pentium III's SSE __m128, but we
>> =A0 =A0 don't want additional code to keep the stack aligned when
>> =A0 =A0 optimizing for code size. =A0*/
>> =A0ix86_preferred_stack_boundary =3D (optimize_size
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ? TA=
RGET_64BIT ? 128 : 32
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 : 12=
8);
>>
>> If you compile code with -Os, you will get 4 byte stack alignment.
>> Just step back, we changed stack alignment from 4 byte to 16byte
>> for SSE since we couldn't realign stack at the time. Now we can
>> realign the stack very efficiently. I think we should do it for SSE
>> to support the existing Linux binaries which have 4 byte stack
>> alignment. If it helps, I can compare -m32 -O3 -msse2 -mfp-math=3Dsse
>> results with SPEC CPU 2006, before and after my patch.
>>
>
> Here are the differences of -m32 -O3 -msse2 -mfpmath=3Dsse -ffast-math
> -funroll-loops
> before and after my patch:
>
> 400.perlbench =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.384615%
> 401.bzip2 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 403.gcc =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.362319%
> 429.mcf =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.813008%
> 445.gobmk =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.921659%
> 456.hmmer =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.549451%
> 458.sjeng =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.438596%
> 462.libquantum =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0%
> 464.h264ref =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 471.omnetpp =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.478469%
> 473.astar =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.645161%
> 483.xalancbmk =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.727273%
> SPECint(R)_base2006 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-0.411523%
> 410.bwaves =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -0.406504%
> 416.gamess =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0%
> 433.milc =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -1.36986%
> 434.zeusmp =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -0.44843%
> 435.gromacs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 436.cactusADM =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 437.leslie3d =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -0.888889%
> 444.namd =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.20482%
> 447.dealII =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -0.350877%
> 450.soplex =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -0.31746%
> 453.povray =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.458716%
> 454.calculix =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0%
> 459.GemsFDTD =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0%
> 465.tonto =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 470.lbm =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00%
> 481.wrf =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.480769%
> 482.sphinx3 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00.940439%
> SPECfp(R)_base2006 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0%
>
> I think we should align stack if SSE variables are put on stack.
>

Darwin ia32 psABI specifies 16byte stack alignment and enforces it
with

#define PREFERRED_STACK_BOUNDARY                        \
  MAX (STACK_BOUNDARY, ix86_preferred_stack_boundary)

On other ia32 targets, 4byte outgoing stack alignment is
correct and allowed. My patch assumes 4 byte incoming
stack alignment only when SSE variables are put on stack.
Automatic stack alignment implementation is quite efficient.
Its performance impact is very limited as show in SPEC CPU
2006 results. It also fixed a regression:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D41156

OK for trunk?

Thanks.

--=20
H.J.