From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30095 invoked by alias); 17 Nov 2010 19:23:40 -0000 Received: (qmail 30087 invoked by uid 22791); 17 Nov 2010 19:23:38 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,TW_AV,TW_VZ,TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-px0-f175.google.com (HELO mail-px0-f175.google.com) (209.85.212.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 17 Nov 2010 19:23:34 +0000 Received: by pxi17 with SMTP id 17so655735pxi.20 for ; Wed, 17 Nov 2010 11:23:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.11.17 with SMTP id 17mr1005822wfk.105.1290021812380; Wed, 17 Nov 2010 11:23:32 -0800 (PST) Received: by 10.143.161.2 with HTTP; Wed, 17 Nov 2010 11:23:32 -0800 (PST) In-Reply-To: References: <20101117064421.GA30836@intel.com> Date: Wed, 17 Nov 2010 20:26:00 -0000 Message-ID: Subject: Re: PATCH: Properly check the end of basic block From: Uros Bizjak To: "H.J. Lu" Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-11/txt/msg01843.txt.bz2 On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu wrote: > On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak wrote: >> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu wrote: >> >>> insn !=3D BB_END (bb) && NEXT_INSN (insn) =3D=3D NEXT_INSN (BB_END (bb)) >>> >>> We should check NEXT_INSN (insn) !=3D NEXT_INSN (BB_END (bb)) in >>> move_or_delete_vzeroupper_2. =A0This patch does it. >> >> Huh? The loop does simple linear scan of all insns in the bb, so it >> can't miss BB_END. IIUC, in your case the bb does not have BB_END >> (bb), but it has NEXT_INSN (BB_END (bb))? > > It has BB_END, but it won't be visited by NEXT_INSN starting from > BB_HEAD. insn !=3D NEXT_INSN (BB_END (bb)) is used to check the > end of the BB everywhere in gcc. > >> Can you please provide a test case that illustrates this? >> > > I am enclosing a work in progress. =A0We noticed that we are > missing a few vzerouppers at -O3 on SPEC CPU 2K/2006. > One isssue is we may have > > foo: > > =A0 =A0 =A0 call bar <<<<< Missing vzeroupper > > =A0 =A0 =A0 256bit vectorized insn > =A0 =A0 =A0 goto foo > > We miss vzeroupper before call bar. =A0We don't have a small testcase. > But this patch fixes this case by inspection. We are checking other > cases. @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set) bb->index, upper_128bits_set); insn =3D BB_HEAD (bb); + last =3D NEXT_INSN (BB_END (bb)); while (insn !=3D BB_END (bb)) { insn =3D NEXT_INSN (insn); + if (insn =3D=3D last) + break; if (!NONDEBUG_INSN_P (insn)) continue; The change above is not needed. The new check is never triggered - the loop terminates when "insn =3D=3D BB_END (bb)" at "while", so I fail to see why additional termination for "NEXT_INSN (insn) =3D=3D NEXT_INSN (BB_END (bb))" is needed. (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped with NEXT_INSN.) @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style) /* Emit vzeroupper if needed. */ if (TARGET_VZEROUPPER - && cfun->machine->use_avx256_p + && (cfun->machine->use_avx256_p || flag_tree_vectorize) && !cfun->machine->caller_return_avx256_p) { cfun->machine->use_vzeroupper_p =3D 1; @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx calla= rg1, } /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration. */ - if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p) + if (TARGET_VZEROUPPER + && (cfun->machine->use_avx256_p || flag_tree_vectorize)) Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with -ftree-vectorize?! It looks that parts (or state machine) that set ...->use_avx256_p flag should be fixed. { rtx unspec; int avx256; diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c new file mode 100644 index 0000000..3301083 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx -mtune=3Dgeneric -dp" } */ + +extern void free (void *); +void +bar (void *ncstrp) +{ + if(ncstrp=3D=3D((void *)0)) + return; + free(ncstrp); +} + +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */ Hm, this testcase doesn't go together with the above change. There is no vectorization involved, and the scan checks that vzeroupper is NOT emitted. Uros.