From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-479097-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 65746 invoked by alias); 3 Mar 2015 05:10:52 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 65709 invoked by uid 48); 3 Mar 2015 05:10:49 -0000
From: "msebor at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1
Date: Tue, 03 Mar 2015 05:10:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: testsuite
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: msebor at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-63175-4-tHOPS9QEck@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg00241.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175
--- Comment #24 from Martin Sebor <msebor at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> Why is the loop bound to i != 16 / sizeof *s?

The upper bound is intended to make the copied sequence fit into one vector
register, irrespective of the size of the array element.

The vector load and store instructions tolerate unaligned accesses and there
are permute instructions that combine the contents of two vector registers into
a single one to compensate for unaligned reads or writes.  I'm not sure it
makes sense to expect unaligned copies involving a single vector register's
worth of data to be vectorized (as done in my proposed tests for char and
short), but I would expect larger unaligned copies (i.e., multiples of 16
bytes) to benefit from it.  In my experiments I've seen no evidence of GCC
attempting to vectorize such copies but I need to do some more research to
understand why.

(In reply to comment #23)

The test uses -maltivec and that's what I've been using as well.  But I see in
the Power ISA book that lxvw4x and stxvw4x are classified as VSX instructions,
so perhaps they shouldn't be emitted without -mvsx.  Although 5.0 doesn't emit
them even with -vsx.