From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-478557-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 66764 invoked by alias); 26 Feb 2015 09:39:54 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 66716 invoked by uid 48); 26 Feb 2015 09:39:50 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1
Date: Thu, 26 Feb 2015 10:19:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: testsuite
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status assigned_to
Message-ID: <bug-63175-4-goXUh8Px1j@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63175-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg02889.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looking at the original description - note that copying cannot be optimized
away, the accesses are to global variables (well, unless you build with -flto
or -fwhole-program which will privatize the stmts).

But of course the "correctness" test is optimized away very early.  So the
testcase should get a __asm__ volatile ("" : : "memory"); inbetween the
copying and the correctness verification.

Currently vectorization is entered with the IL

  <bb 2>:
  _8 = MEM[(unsigned int *)&in + 4B];
  MEM[(unsigned int *)&out] = _8;
  _14 = MEM[(unsigned int *)&in + 8B];
  MEM[(unsigned int *)&out + 4B] = _14;
  _20 = MEM[(unsigned int *)&in + 12B];
  MEM[(unsigned int *)&out + 8B] = _20;
  _26 = MEM[(unsigned int *)&in + 16B];
  MEM[(unsigned int *)&out + 12B] = _26;
  return 0;

(see - no check anymore)

We generate (with -mcpu=e6500 -m64 -maltivec -mabi=altivec, just to pick one
example)

  <bb 2>:
  vect__2.12_11 = __builtin_altivec_mask_for_load (&MEM[(unsigned int *)&in +
4B]);
  vectp.14_13 = &MEM[(unsigned int *)&in + 4B] & -16B;
  vect__2.15_14 = MEM[(unsigned int *)vectp.14_13];
  vectp.14_16 = &MEM[(void *)&in + 16B] & -16B;
  vect__2.16_17 = MEM[(unsigned int *)vectp.14_16];
  vect__2.17_18 = REALIGN_LOAD <vect__2.15_14, vect__2.16_17, vect__2.12_11>;
  MEM[(unsigned int *)&out] = vect__2.17_18;
  return 0;

and

(insn 16 15 17 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 8)
        (mem:DI (plus:DI (reg:DI 170)
                (const_int 8 [0x8])) [1 MEM[(unsigned int *)&MEM[(unsigned int
*)&in + 4B] & -16B]+8 S8 A32])) t.c:14 -1
     (nil))

(insn 17 16 18 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 0)
        (mem:DI (reg:DI 170) [1 MEM[(unsigned int *)&MEM[(unsigned int *)&in +
4B] & -16B]+0 S8 A32])) t.c:14 -1
     (nil))

(insn 21 20 22 (set (reg:V4SI 176)
        (mem:V4SI (reg:DI 174) [1 MEM[(unsigned int *)&MEM[(void *)&in + 16B] &
-16B]+0 S16 A128])) t.c:14 -1
     (nil))

so for some reason we expand the first aligned load using two DI loads.

Investigating.

I have a fix which ends up producing

.L.main1:
        addis 9,2,.LANCHOR0@toc@ha
        li 3,0
        addi 9,9,.LANCHOR0@toc@l
        addi 10,9,4
        addi 9,9,16
        neg 8,10
        lvx 0,0,9
        lvsr 13,0,8
        addis 9,2,.LANCHOR1@toc@ha
        lvx 1,0,10
        addi 9,9,.LANCHOR1@toc@l
        vperm 0,1,0,13
        stvx 0,0,9
        blr

not sure if that is the same as with 4.8 though (don't have a cross ready
to verify - but the RTL looks good).