From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-478995-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31800 invoked by alias); 2 Mar 2015 14:24:39 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 31681 invoked by uid 48); 2 Mar 2015 14:24:36 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1
Date: Mon, 02 Mar 2015 14:24:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: testsuite
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-63175-4-mzU8wOWRoJ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg00139.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #15)
> Btw, first of all unaligned stores are not supported according to the targets
> vectorization hook, thus you'd need to peel the loop to make the store
> aligned
> which for some reason doesn't happen.

Quite obvious - the loop iterates 8 times but the vectorization factor is 8
as well, so if we peel off a iteration to align the destination the vectorized
loop will never enter.

Why is the loop bound to i != 16 / sizeof *s?

>  But when peeled you certainly will see
> byte/short/word stores at least.

Like when I increase the iteration count I get for copy_short_0_1:

.L.copy_Type_0_1:
        addis 6,2,.LANCHOR0@toc@ha
        addis 7,2,.LANCHOR1@toc@ha
        addi 6,6,.LANCHOR0@toc@l
        addi 7,7,.LANCHOR1@toc@l
        li 8,7
        addi 9,6,2
        mr 10,7
        mtctr 8
        .p2align 4,,15
.L2:
        addi 10,10,2
        lhz 8,-2(10)
        addi 9,9,2
        sth 8,-2(9)
        bdnz .L2
        addi 8,7,14
        addi 7,7,29
        neg 5,8
        lvx 1,0,8
        lvx 0,0,7
        li 7,16
        lvsr 13,0,5
        addi 8,10,14
        addi 9,9,14
        addi 10,10,16
        vperm 0,1,0,13
        stvx 0,6,7
        .p2align 4,,15
.L3:
        lhzu 7,2(8)
        cmpld 7,10,8
        sthu 7,2(9)
        bne+ 7,.L3
        blr

the cost model should probably reject this, but it does not:

t.c:36:1: note: Cost model analysis:
  Vector inside of loop cost: 3
  Vector prologue cost: 17
  Vector epilogue cost: 2
  Scalar iteration cost: 2
  Scalar outside cost: 0
  Vector outside cost: 19
  prologue iterations: 7
  epilogue iterations: 1
  Calculated minimum iters for profitability: 10