From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19580 invoked by alias); 15 Oct 2009 19:11:42 -0000 Received: (qmail 19569 invoked by uid 22791); 15 Oct 2009 19:11:41 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-ew0-f221.google.com (HELO mail-ew0-f221.google.com) (209.85.219.221) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Oct 2009 19:11:35 +0000 Received: by ewy21 with SMTP id 21so1203300ewy.8 for ; Thu, 15 Oct 2009 12:11:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.88.141 with SMTP id a13mr120554wef.209.1255633893039; Thu, 15 Oct 2009 12:11:33 -0700 (PDT) In-Reply-To: <4AD76CBA.7030204@gmail.com> References: <20090806214216.GA14439@lucon.org> <20091015154823.GA15787@lucon.org> <4AD76CBA.7030204@gmail.com> Date: Thu, 15 Oct 2009 19:22:00 -0000 Message-ID: <6dc9ffc80910151211u470c386eg30f74f6dd6f728bc@mail.gmail.com> Subject: Re: PATCH: PR target/40838: gcc shouldn't assume that the stack is aligned From: "H.J. Lu" To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg01006.txt.bz2 On Thu, Oct 15, 2009 at 11:40 AM, Uros Bizjak wrote: > On 10/15/2009 05:48 PM, H.J. Lu wrote: >> >> On Thu, Aug 06, 2009 at 02:42:16PM -0700, H.J. Lu wrote: >> >>> >>> Hi, >>> >>> In 32bit, the incoming stack may not be 16 byte aligned. =A0This patch >>> assumes the incoming stack is 4 byte aligned and realigns stack if any >>> SSE variable is put on stack. Any comments? >>> >>> >> >> For i386, gcc, up to gcc 3.4, had been generating 4byte stack alignment. >> 4byte stack alignment is a problem unless SSE registers are used, which >> may be generated by vectorizer or intrinsics. This patch accommodates >> the binaries generated by previous versions of gcc if SSE registers are >> generated by vectorizer or intrinsics. The performance impact on SPEC >> CPU 2006 is minimum: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D41156#c7 >> >> OK for trunk? >> > > Comments bellow. > >> --- >> gcc/ >> >> 2009-09-26 =A0H.J. Lu >> >> =A0 =A0 =A0 =A0PR target/40838 >> =A0 =A0 =A0 =A0* function.c (allocate_struct_function): Initialize >> =A0 =A0 =A0 =A0forced_stack_alignment and forced_stack_mode. >> >> =A0 =A0 =A0 =A0* function.h (function): Add forced_stack_alignment and >> =A0 =A0 =A0 =A0forced_stack_mode. >> >> =A0 =A0 =A0 =A0* tree-vect-data-refs.c (vect_verify_datarefs_alignment): >> =A0 =A0 =A0 =A0Update forced_stack_alignment and forced_stack_mode. >> >> =A0 =A0 =A0 =A0* config/i386/i386.c (VALID_SSE_VECTOR_MODE): New. >> =A0 =A0 =A0 =A0(ix86_update_stack_boundary): In 32bit, use STACK_BOUNDAR= Y if >> =A0 =A0 =A0 =A0any SSE variables are forced on stack. >> =A0 =A0 =A0 =A0(ix86_minimum_alignment): In 32bit, update >> =A0 =A0 =A0 =A0forced_stack_alignment and forced_stack_mode if any SSE >> =A0 =A0 =A0 =A0variables are put on stack. >> >> gcc/testsuite/ >> >> >> > > ... > >> >> Index: gcc/function.h >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- gcc/function.h =A0 =A0 =A0(revision 152202) >> +++ gcc/function.h =A0 =A0 =A0(working copy) >> @@ -532,6 +532,12 @@ struct GTY(()) function { >> =A0 =A0 =A0 a string describing the reason for failure. =A0*/ >> =A0 =A0const char * GTY((skip)) cannot_be_copied_reason; >> >> + =A0/* The largest alignment forced on the stack. =A0*/ >> + =A0unsigned int forced_stack_alignment; >> + >> + =A0/* The mode of the largest alignment forced on the stack. =A0*/ >> + =A0enum machine_mode forced_stack_mode; >> + >> =A0 =A0/* Collected bit flags. =A0*/ >> >> =A0 =A0/* Number of units of general registers that need saving in stdarg >> Index: gcc/tree-vect-data-refs.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- gcc/tree-vect-data-refs.c =A0 (revision 152202) >> +++ gcc/tree-vect-data-refs.c =A0 (working copy) >> @@ -927,6 +927,8 @@ vect_verify_datarefs_alignment (loop_vec >> =A0 =A0 =A0{ >> =A0 =A0 =A0 =A0gimple stmt =3D DR_STMT (dr); >> =A0 =A0 =A0 =A0stmt_vec_info stmt_info =3D vinfo_for_stmt (stmt); >> + =A0 =A0 =A0enum machine_mode mode; >> + =A0 =A0 =A0unsigned int align; >> >> =A0 =A0 =A0 =A0/* For interleaving, only the alignment of the first acce= ss >> matters. =A0*/ >> =A0 =A0 =A0 =A0if (STMT_VINFO_STRIDED_ACCESS (stmt_info) >> @@ -947,6 +949,15 @@ vect_verify_datarefs_alignment (loop_vec >> =A0 =A0 =A0 =A0 =A0 =A0 =A0} >> =A0 =A0 =A0 =A0 =A0 =A0return false; >> =A0 =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0mode =3D TYPE_MODE (STMT_VINFO_VECTYPE (stmt_info)); >> + =A0 =A0 =A0align =3D GET_MODE_ALIGNMENT (mode); >> + =A0 =A0 =A0if (cfun->forced_stack_alignment< =A0align) >> + =A0 =A0 =A0 { >> + =A0 =A0 =A0 =A0 cfun->forced_stack_alignment =3D align; >> + =A0 =A0 =A0 =A0 cfun->forced_stack_mode =3D mode; >> + =A0 =A0 =A0 } >> + >> =A0 =A0 =A0 =A0if (supportable_dr_alignment !=3D dr_aligned >> =A0 =A0 =A0 =A0 =A0 =A0&& =A0vect_print_dump_info (REPORT_ALIGNMENT)) >> =A0 =A0 =A0 =A0 =A0fprintf (vect_dump, "Vectorizing an unaligned access.= "); >> Index: gcc/config/i386/i386.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- gcc/config/i386/i386.c =A0 =A0 =A0(revision 152202) >> +++ gcc/config/i386/i386.c =A0 =A0 =A0(working copy) >> @@ -8151,16 +8151,31 @@ find_drap_reg (void) >> =A0 =A0 =A0} >> =A0} >> >> +#define VALID_SSE_VECTOR_MODE(MODE) \ >> + =A0((MODE) =3D=3D V4SFmode || (MODE) =3D=3D V4SImode || (MODE) =3D=3D = V2DFmode \ >> + =A0 || (MODE) =3D=3D V16QImode || (MODE) =3D=3D V8HImode || (MODE) =3D= =3D V2DImode) >> + >> =A0/* Update incoming stack boundary and estimated stack alignment. =A0*/ >> >> =A0static void >> =A0ix86_update_stack_boundary (void) >> =A0{ >> + =A0/* Should we use STACK_BOUNDARY for incoming stack boundary? =A0*/ >> + =A0unsigned int incoming_stack_boundary; >> + >> + =A0/* In 32bit, use STACK_BOUNDARY for incoming stack boundary if any >> + =A0 =A0 SSE variables are put on stack. */ >> + =A0if (!TARGET_64BIT >> +&& =A0VALID_SSE_VECTOR_MODE (cfun->forced_stack_mode)) >> + =A0 =A0incoming_stack_boundary =3D STACK_BOUNDARY; >> + =A0else >> + =A0 =A0incoming_stack_boundary =3D ix86_default_incoming_stack_boundar= y; >> + >> =A0 =A0/* Prefer the one specified at command line. */ >> =A0 =A0ix86_incoming_stack_boundary >> =A0 =A0 =A0=3D (ix86_user_incoming_stack_boundary >> =A0 =A0 =A0 =A0 ? ix86_user_incoming_stack_boundary >> - =A0 =A0 =A0 : ix86_default_incoming_stack_boundary); >> + =A0 =A0 =A0 : incoming_stack_boundary); >> >> =A0 =A0/* Incoming stack alignment can be changed on individual functions >> =A0 =A0 =A0 via force_align_arg_pointer attribute. =A0We use the smallest >> @@ -19773,7 +19788,7 @@ ix86_minimum_alignment (tree exp, enum m >> =A0{ >> =A0 =A0tree type, decl; >> >> - =A0if (TARGET_64BIT || align !=3D 64 || ix86_preferred_stack_boundary>= =3D 64) >> + =A0if (TARGET_64BIT) >> =A0 =A0 =A0return align; >> >> =A0 =A0if (exp&& =A0DECL_P (exp)) >> @@ -19787,6 +19802,25 @@ ix86_minimum_alignment (tree exp, enum m >> =A0 =A0 =A0 =A0decl =3D NULL; >> =A0 =A0 =A0} >> >> + =A0/* In 32bit, update forced_stack_mode if any SSE variables are put >> + =A0 =A0 on stack. =A0*/ >> + =A0if (cfun->forced_stack_alignment< =A0128) >> + =A0 =A0{ >> + =A0 =A0 =A0if (VALID_SSE_VECTOR_MODE (mode)) >> + =A0 =A0 =A0 { >> + =A0 =A0 =A0 =A0 cfun->forced_stack_alignment =3D 128; >> + =A0 =A0 =A0 =A0 cfun->forced_stack_mode =3D mode; >> + =A0 =A0 =A0 } >> + =A0 =A0 =A0else if ((type&& =A0VALID_SSE_VECTOR_MODE (TYPE_MODE (type)= ))) >> + =A0 =A0 =A0 { >> + =A0 =A0 =A0 =A0 cfun->forced_stack_alignment =3D 128; >> + =A0 =A0 =A0 =A0 cfun->forced_stack_mode =3D TYPE_MODE (type); >> + =A0 =A0 =A0 } >> + =A0 =A0} >> + >> + =A0if (align !=3D 64 || ix86_preferred_stack_boundary>=3D 64) >> + =A0 =A0return align; >> + >> > > We should use PREFERRED_STACK_BOUNDARY_DEFAULT instead of hardcoded 128 > here. At least the comment for P_S_B_D says: > > /* It should be MIN_STACK_BOUNDARY. =A0But we set it to 128 bits for > =A0 both 32bit and 64bit, to support codes that need 128 bit stack > =A0 alignment for SSE instructions, but can't realign the stack. =A0*/ > #define PREFERRED_STACK_BOUNDARY_DEFAULT 128 In forced_stack_alignment, we collect the hard alignment requirement on stack. The decision if stack should be aligned depends on the incoming stack alignment and the hard stack alignment. It is separate from PREFERRED_STACK_BOUNDARY_DEFAULT. > OTOH, I think that this whole stuff should depend on -mstackrealing someh= ow, > according to the comment: > > /* 1 if -mstackrealign should be turned on by default. =A0It will > =A0 generate an alternate prologue and epilogue that realigns the > =A0 runtime stack if nessary. =A0This supports mixing codes that keep a > =A0 4-byte aligned stack, as specified by i386 psABI, with codes that > =A0 need a 16-byte aligned stack, as required by SSE instructions. =A0If > =A0 STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT is > =A0 128, stacks for all functions may be realigned. =A0*/ > #define STACK_REALIGN_DEFAULT 0 > > As far as I understand from many comments from the PR40838 trail (especia= lly > comment #51), -mstackrealing is not effective in some cases involving > automatic SSE variables on the stack. I think that -mstackrealign should = be > fixed in the way your patch outlines, so old/broken sources that assume > 4byte alignment can be compiled using this option without penalizing > new/fixed code that assumes 16byte alignment. > -mstackrealign is very effective. If we turn it on by default, it works fin= e: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D40838#c45 It is just that we don't want to make it the default since we will realign the stack for almost all functions. In most cases, misaligned stack won't cause any problems. What my patch does is to assume 4byte incoming stack alignment for functions containing SSE instructions which require hard 16byte stack alignment. After my patch is applied, we can update -mstackrealign to make it an no-op since all the problems it tries to solve: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00854.html http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00526.html disappeared with my patch. Thanks. --=20 H.J.