From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 21BC53858401 for ; Fri, 1 Mar 2024 09:25:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 21BC53858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 21BC53858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709285106; cv=none; b=QYRStDqIhCBemF1aHry/q287/668qk4m8NKi7jiEXjzo8WVLqf7a4uWd3qlXSUwfKgq6vRRt3MC3vQ+1KrhlkKHV8tasZJYc5WI9HAoOi8fQacms08Ud0co06FA14SHjTwO7fbJHyyNtW3aaRBa+z6pYN6mujWbMhae4IVoPpQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709285106; c=relaxed/simple; bh=hn9zmIYSK0pgMydAf2127Rm9/FLmdmfN14S69MTWef8=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=F2ItvUt828qMtUENJo5Glf2oXby4imeJTsvuRb2d5hz/LUD/8NTfBxh7xU3JrNM3Wool/esQ0G7L2bcjv0jMcfrCEZ+lrMX6uIIW1PFtuj9Y2hFkBxnYff6CBL9UZyqb+xpG+FzhohuB0l0r5dTnZAasuUzVWEevXZAnSgus3IU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A9E91FB; Fri, 1 Mar 2024 01:25:38 -0800 (PST) Received: from [10.57.68.248] (unknown [10.57.68.248]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D17183F762; Fri, 1 Mar 2024 01:24:59 -0800 (PST) Message-ID: <6846f165-1cfb-415c-9a47-e620c784dc96@arm.com> Date: Fri, 1 Mar 2024 09:24:58 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len Content-Language: en-US To: Richard Biener , gcc-patches@gcc.gnu.org References: <20231110131658.09A5D13398@imap2.suse-dmz.suse.de> From: "Andre Vieira (lists)" In-Reply-To: <20231110131658.09A5D13398@imap2.suse-dmz.suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Bootstrapped and tested the gcc-13 backport of this on gcc-12 for aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions. OK to push to gcc-12 branch? Kind regards, Andre Vieira On 10/11/2023 13:16, Richard Biener wrote: > The following fixes the issue that when SLP stmts are internal defs > but appear invariant because they end up only using invariant defs > then they get scheduled outside of the loop. This nice optimization > breaks down when loop masks or lens are applied since those are not > explicitly tracked as dependences. The following makes sure to never > schedule internal defs outside of the vectorized loop when the > loop uses masks/lens. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > PR tree-optimization/110221 > * tree-vect-slp.cc (vect_schedule_slp_node): When loop > masking / len is applied make sure to not schedule > intenal defs outside of the loop. > > * gfortran.dg/pr110221.f: New testcase. > --- > gcc/testsuite/gfortran.dg/pr110221.f | 17 +++++++++++++++++ > gcc/tree-vect-slp.cc | 10 ++++++++++ > 2 files changed, 27 insertions(+) > create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f > > diff --git a/gcc/testsuite/gfortran.dg/pr110221.f b/gcc/testsuite/gfortran.dg/pr110221.f > new file mode 100644 > index 00000000000..8b57384313a > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/pr110221.f > @@ -0,0 +1,17 @@ > +C PR middle-end/68146 > +C { dg-do compile } > +C { dg-options "-O2 -w" } > +C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { target avx512f } } > + SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY) > + IMPLICIT DOUBLE PRECISION (A,B,G,O-Y) > + IMPLICIT COMPLEX*16 (C,Z) > + DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*) > + N=INT(V) > + CALL GAMMA2(VG,GA) > + DO 65 K=1,N > + CBY(K)=CYY > +65 CONTINUE > + CDJ(0)=V0/Z*CBJ(0)-CBJ(1) > + DO 70 K=1,N > +70 CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1) > + END > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 3e5814c3a31..80e279d8f50 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo, > /* Emit other stmts after the children vectorized defs which is > earliest possible. */ > gimple *last_stmt = NULL; > + if (auto loop_vinfo = dyn_cast (vinfo)) > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + { > + /* But avoid scheduling internal defs outside of the loop when > + we might have only implicitly tracked loop mask/len defs. */ > + gimple_stmt_iterator si > + = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header); > + last_stmt = *si; > + } > bool seen_vector_def = false; > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)