From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 114965 invoked by alias); 19 Mar 2018 09:11:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 114946 invoked by uid 89); 19 Mar 2018 09:11:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Mar 2018 09:11:14 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F2958AD95; Mon, 19 Mar 2018 09:11:11 +0000 (UTC) Date: Mon, 19 Mar 2018 11:55:00 -0000 From: Richard Biener To: Tom de Vries cc: gcc-patches@gcc.gnu.org, Jan Hubicka Subject: Re: [PATCH] Fix PR84512 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2018-03/txt/msg00863.txt.bz2 On Fri, 16 Mar 2018, Tom de Vries wrote: > On 03/16/2018 12:55 PM, Richard Biener wrote: > > On Fri, 16 Mar 2018, Tom de Vries wrote: > > > > > On 02/27/2018 01:42 PM, Richard Biener wrote: > > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > > > > =================================================================== > > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) > > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) > > > > @@ -0,0 +1,15 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > > > + > > > > +int foo() > > > > +{ > > > > + int a[10]; > > > > + for(int i = 0; i < 10; ++i) > > > > + a[i] = i*i; > > > > + int res = 0; > > > > + for(int i = 0; i < 10; ++i) > > > > + res += a[i]; > > > > + return res; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > > > > > This fails for nvptx, because it doesn't have the required vector > > > operations. > > > To fix the fail, I've added requiring effective target vect_int_mult. > > > > On targets that do not vectorize you should see the scalar loops unrolled > > instead. Or do you have only one loop vectorized? > > Sort of. Loop vectorization has no effect, and the scalar loops are completely > unrolled. But then slp vectorization vectorizes the stores. > > So at optimized we have: > ... > MEM[(int *)&a] = { 0, 1 }; > MEM[(int *)&a + 8B] = { 4, 9 }; > MEM[(int *)&a + 16B] = { 16, 25 }; > MEM[(int *)&a + 24B] = { 36, 49 }; > MEM[(int *)&a + 32B] = { 64, 81 }; > _6 = a[0]; > _28 = a[1]; > res_29 = _6 + _28; > _35 = a[2]; > res_36 = res_29 + _35; > _42 = a[3]; > res_43 = res_36 + _42; > _49 = a[4]; > res_50 = res_43 + _49; > _56 = a[5]; > res_57 = res_50 + _56; > _63 = a[6]; > res_64 = res_57 + _63; > _70 = a[7]; > res_71 = res_64 + _70; > _77 = a[8]; > res_78 = res_71 + _77; > _2 = a[9]; > res_11 = _2 + res_78; > a ={v} {CLOBBER}; > return res_11; > ... > > The stores and loads are eliminated by dse1 in the rtl phase, and in the end > we have: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > .local .align 16 .b8 %frame_ar[48]; > .reg.u64 %frame; > cvta.local.u64 %frame, %frame_ar; > mov.u32 %value, 285; > st.param.u32 [%value_out], %value; > ret; > } > ... > > > That's precisely > > what the PR was about... which means it isn't fixed for nvptx :/ > > Indeed the assembly is not optimal, and would be optimal if we'd have optimal > code at optimized. > > FWIW, using this patch we generate optimal code at optimized: > ... > diff --git a/gcc/passes.def b/gcc/passes.def > index 3ebcfc30349..6b64f600c4a 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_tracer); > NEXT_PASS (pass_thread_jumps); > NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > + NEXT_PASS (pass_fre); > NEXT_PASS (pass_strlen); > NEXT_PASS (pass_thread_jumps); > NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > ... > > and we get: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > mov.u32 %value, 285; > st.param.u32 [%value_out], %value; > ret; > } > ... > > I could file a missing optimization PR for nvptx, but I'm not sure where this > should be fixed. Ah, yeah... the usual issue then. Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? Thanks, Richard.