From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-495779-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 112543 invoked by alias); 28 Aug 2015 07:46:07 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 112464 invoked by uid 55); 28 Aug 2015 07:46:03 -0000
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
Date: Fri, 28 Aug 2015 07:46:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.4.0
X-Bugzilla-Keywords: alias, missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: FIXED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 6.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-37021-4-IOlJVMbkIu@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-37021-4@http.gcc.gnu.org/bugzilla/>
References: <bug-37021-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-08/txt/msg01921.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 27 Aug 2015, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
> 
> --- Comment #22 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #21)
> > (In reply to Bill Schmidt from comment #20)
> 
> ...<snip>...
> > 
> > I see it only failing due to cost issues (tried ppc64le and -mcpu=power8).
> > The unaligned loads cost 3 and we end up with
> > 
> > t.f90:8:0: note: Cost model analysis:
> >   Vector inside of loop cost: 40
> >   Vector prologue cost: 8
> >   Vector epilogue cost: 4
> >   Scalar iteration cost: 12
> >   Scalar outside cost: 6
> >   Vector outside cost: 12
> >   prologue iterations: 0
> >   epilogue iterations: 0
> > t.f90:8:0: note: cost model: the vector iteration cost = 40 divided by the
> > scalar iteration cost = 12 is greater or equal to the vectorization factor =
> > 1.
> > 
> > Note that we are (still) not very good in estimating the SLP cost as we
> > account 4 vector loads here (because we essentially will end up with
> > 4 different permutations used), so the "unaligned" part is accounted for
> > too much and likely the permutation cost as well.  Both are a limitation
> > of the SLP data structures and not easily fixable.  With
> > -fvect-cost-model=unlimited I see both loops vectorized.
> 
> Yes, I get these same results for the loop vectorizer (using -O2
> -ftree-vectorize -mcpu=power8 -ffast-math).  But I was looking at the failure
> to do SLP vectorization.  In comment 19 you indicated this was now working,
> presumably on x86, but for Power we fail to SLP-vectorize
> fast-math-pr37021.f90:9:0.

Err, I meant loop SLP vectorization as opposed to loop vectorization
with interleaving...  Basic-block SLP doesn't work because (at least)
it does not handle reductions yet (I have done some early work here
but wasn't able to finish it)