From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65746 invoked by alias); 3 Mar 2015 05:10:52 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 65709 invoked by uid 48); 3 Mar 2015 05:10:49 -0000 From: "msebor at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1 Date: Tue, 03 Mar 2015 05:10:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: testsuite X-Bugzilla-Version: 4.9.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: msebor at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.9.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-03/txt/msg00241.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175 --- Comment #24 from Martin Sebor --- (In reply to Richard Biener from comment #16) > Why is the loop bound to i != 16 / sizeof *s? The upper bound is intended to make the copied sequence fit into one vector register, irrespective of the size of the array element. The vector load and store instructions tolerate unaligned accesses and there are permute instructions that combine the contents of two vector registers into a single one to compensate for unaligned reads or writes. I'm not sure it makes sense to expect unaligned copies involving a single vector register's worth of data to be vectorized (as done in my proposed tests for char and short), but I would expect larger unaligned copies (i.e., multiples of 16 bytes) to benefit from it. In my experiments I've seen no evidence of GCC attempting to vectorize such copies but I need to do some more research to understand why. (In reply to comment #23) The test uses -maltivec and that's what I've been using as well. But I see in the Power ISA book that lxvw4x and stxvw4x are classified as VSX instructions, so perhaps they shouldn't be emitted without -mvsx. Although 5.0 doesn't emit them even with -vsx.