From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112543 invoked by alias); 28 Aug 2015 07:46:07 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 112464 invoked by uid 55); 28 Aug 2015 07:46:03 -0000 From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized Date: Fri, 28 Aug 2015 07:46:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.4.0 X-Bugzilla-Keywords: alias, missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 6.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-08/txt/msg01921.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021 --- Comment #24 from rguenther at suse dot de --- On Thu, 27 Aug 2015, wschmidt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021 > > --- Comment #22 from Bill Schmidt --- > (In reply to Richard Biener from comment #21) > > (In reply to Bill Schmidt from comment #20) > > ...... > > > > I see it only failing due to cost issues (tried ppc64le and -mcpu=power8). > > The unaligned loads cost 3 and we end up with > > > > t.f90:8:0: note: Cost model analysis: > > Vector inside of loop cost: 40 > > Vector prologue cost: 8 > > Vector epilogue cost: 4 > > Scalar iteration cost: 12 > > Scalar outside cost: 6 > > Vector outside cost: 12 > > prologue iterations: 0 > > epilogue iterations: 0 > > t.f90:8:0: note: cost model: the vector iteration cost = 40 divided by the > > scalar iteration cost = 12 is greater or equal to the vectorization factor = > > 1. > > > > Note that we are (still) not very good in estimating the SLP cost as we > > account 4 vector loads here (because we essentially will end up with > > 4 different permutations used), so the "unaligned" part is accounted for > > too much and likely the permutation cost as well. Both are a limitation > > of the SLP data structures and not easily fixable. With > > -fvect-cost-model=unlimited I see both loops vectorized. > > Yes, I get these same results for the loop vectorizer (using -O2 > -ftree-vectorize -mcpu=power8 -ffast-math). But I was looking at the failure > to do SLP vectorization. In comment 19 you indicated this was now working, > presumably on x86, but for Power we fail to SLP-vectorize > fast-math-pr37021.f90:9:0. Err, I meant loop SLP vectorization as opposed to loop vectorization with interleaving... Basic-block SLP doesn't work because (at least) it does not handle reductions yet (I have done some early work here but wasn't able to finish it)