From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 66764 invoked by alias); 26 Feb 2015 09:39:54 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 66716 invoked by uid 48); 26 Feb 2015 09:39:50 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug testsuite/63175] [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1 Date: Thu, 26 Feb 2015 10:19:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: testsuite X-Bugzilla-Version: 4.9.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.9.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg02889.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63175 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener --- Looking at the original description - note that copying cannot be optimized away, the accesses are to global variables (well, unless you build with -flto or -fwhole-program which will privatize the stmts). But of course the "correctness" test is optimized away very early. So the testcase should get a __asm__ volatile ("" : : "memory"); inbetween the copying and the correctness verification. Currently vectorization is entered with the IL : _8 = MEM[(unsigned int *)&in + 4B]; MEM[(unsigned int *)&out] = _8; _14 = MEM[(unsigned int *)&in + 8B]; MEM[(unsigned int *)&out + 4B] = _14; _20 = MEM[(unsigned int *)&in + 12B]; MEM[(unsigned int *)&out + 8B] = _20; _26 = MEM[(unsigned int *)&in + 16B]; MEM[(unsigned int *)&out + 12B] = _26; return 0; (see - no check anymore) We generate (with -mcpu=e6500 -m64 -maltivec -mabi=altivec, just to pick one example) : vect__2.12_11 = __builtin_altivec_mask_for_load (&MEM[(unsigned int *)&in + 4B]); vectp.14_13 = &MEM[(unsigned int *)&in + 4B] & -16B; vect__2.15_14 = MEM[(unsigned int *)vectp.14_13]; vectp.14_16 = &MEM[(void *)&in + 16B] & -16B; vect__2.16_17 = MEM[(unsigned int *)vectp.14_16]; vect__2.17_18 = REALIGN_LOAD ; MEM[(unsigned int *)&out] = vect__2.17_18; return 0; and (insn 16 15 17 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 8) (mem:DI (plus:DI (reg:DI 170) (const_int 8 [0x8])) [1 MEM[(unsigned int *)&MEM[(unsigned int *)&in + 4B] & -16B]+8 S8 A32])) t.c:14 -1 (nil)) (insn 17 16 18 (set (subreg:DI (reg:V4SI 171 [ vect__2.15 ]) 0) (mem:DI (reg:DI 170) [1 MEM[(unsigned int *)&MEM[(unsigned int *)&in + 4B] & -16B]+0 S8 A32])) t.c:14 -1 (nil)) (insn 21 20 22 (set (reg:V4SI 176) (mem:V4SI (reg:DI 174) [1 MEM[(unsigned int *)&MEM[(void *)&in + 16B] & -16B]+0 S16 A128])) t.c:14 -1 (nil)) so for some reason we expand the first aligned load using two DI loads. Investigating. I have a fix which ends up producing .L.main1: addis 9,2,.LANCHOR0@toc@ha li 3,0 addi 9,9,.LANCHOR0@toc@l addi 10,9,4 addi 9,9,16 neg 8,10 lvx 0,0,9 lvsr 13,0,8 addis 9,2,.LANCHOR1@toc@ha lvx 1,0,10 addi 9,9,.LANCHOR1@toc@l vperm 0,1,0,13 stvx 0,0,9 blr not sure if that is the same as with 4.8 though (don't have a cross ready to verify - but the RTL looks good).