public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
@ 2020-07-07 16:49 seurer at linux dot vnet.ibm.com
  2020-07-08  7:10 ` [Bug testsuite/96098] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: seurer at linux dot vnet.ibm.com @ 2020-07-07 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

            Bug ID: 96098
           Summary: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails
                    since r11-205
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: testsuite
          Assignee: unassigned at gcc dot gnu.org
          Reporter: seurer at linux dot vnet.ibm.com
  Target Milestone: ---

g:bc484e250990393e887f7239157cc85ce6fadcce, r11-205

make -k check-gcc RUNTESTFLAGS=vect.exp=gcc.dg/vect/bb-slp-pr68892.c

FAIL: gcc.dg/vect/bb-slp-pr68892.c scan-tree-dump-times slp2 "Basic block will
be vectorized" 1
FAIL: gcc.dg/vect/bb-slp-pr68892.c -flto -ffat-lto-objects 
scan-tree-dump-times 
slp2 "Basic block will be vectorized" 1

# of expected passes            4
# of unexpected failures        2

Seeing this on powerpc64 both BE and LE and on all power versions.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug testsuite/96098] [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
  2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
@ 2020-07-08  7:10 ` rguenth at gcc dot gnu.org
  2020-10-16 11:55 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-08  7:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-07-08
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Target Milestone|---                         |11.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcase probably needs to move to costmodel/*/ because it's outcome now
depends on the actual costing.  On x86_64:

0x54db890 _1 2 times vector_store costs 24 in body
0x54db890 <unknown> 1 times vec_construct costs 8 in prologue
0x54db890 <unknown> 1 times vec_construct costs 8 in prologue
0x54dcba0 _1 1 times scalar_store costs 12 in body
0x54dcba0 _2 1 times scalar_store costs 12 in body
0x54dcba0 _3 1 times scalar_store costs 12 in body
0x54dcba0 _4 1 times scalar_store costs 12 in body

while ppc64le has

0x42edf00 _1 2 times vector_store costs 2 in body
0x42edf00 <unknown> 1 times vec_construct costs 2 in prologue
0x42edf00 <unknown> 1 times vec_construct costs 2 in prologue
0x42ef850 _1 1 times scalar_store costs 1 in body
0x42ef850 _2 1 times scalar_store costs 1 in body
0x42ef850 _3 1 times scalar_store costs 1 in body
0x42ef850 _4 1 times scalar_store costs 1 in body

so for ppc64le it's 6 vector vs. 4 scalar while on x86_64 it's 36 vector
vs. 48 scalar.  As the comment in the testcase explains the vectorization
is considered a "bug" (well, I'd say if write-combining is profitable
we should of course do it):

/* ???  Due to the gaps we fall back to scalar loads which makes the
   vectorization profitable.  */
/* { dg-final { scan-tree-dump "not profitable" "slp2" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "BB vectorization with gaps at the end of
a load is not supported" 1 "slp2" } } */
/* { dg-final { scan-tree-dump-times "Basic block will be vectorized" 1 "slp2"
} } */

on x86_64 we get

        movsd   a+2048(%rip), %xmm0
        movsd   a(%rip), %xmm1
        movhpd  a+3072(%rip), %xmm0
        movhpd  a+1024(%rip), %xmm1
        movaps  %xmm1, b(%rip)
        movaps  %xmm0, b+16(%rip)

vs.

        movsd   a(%rip), %xmm0
        movsd   %xmm0, b(%rip)
        movsd   a+1024(%rip), %xmm0
        movsd   %xmm0, b+8(%rip)
        movsd   a+2048(%rip), %xmm0
        movsd   %xmm0, b+16(%rip)
        movsd   a+3072(%rip), %xmm0
        movsd   %xmm0, b+24(%rip)

where it looks profitable (larger stores are also always good for STLF)
while on ppc64le we have

0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     foo,.-foo
        addis 9,2,.LANCHOR0+1024@toc@ha
        lfd 10,.LANCHOR0+1024@toc@l(9)
        addis 9,2,.LANCHOR0+2048@toc@ha
        lfd 11,.LANCHOR0+2048@toc@l(9)
        addis 9,2,.LANCHOR0+3072@toc@ha
        lfd 12,.LANCHOR0+3072@toc@l(9)
        addis 9,2,.LANCHOR0+4096@toc@ha
        lfd 0,.LANCHOR0+4096@toc@l(9)
        addis 9,2,.LANCHOR0@toc@ha
        stfd 10,.LANCHOR0@toc@l(9)
        addis 9,2,.LANCHOR0+8@toc@ha
        stfd 11,.LANCHOR0+8@toc@l(9)
        addis 9,2,.LANCHOR0+16@toc@ha
        stfd 12,.LANCHOR0+16@toc@l(9)
        addis 9,2,.LANCHOR0+24@toc@ha
        stfd 0,.LANCHOR0+24@toc@l(9)
        blr

vs (cost model disabled):

0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     foo,.-foo
        addis 9,2,.LANCHOR0+2048@toc@ha
        addis 8,2,.LANCHOR0@toc@ha
        li 10,16
        lfd 10,.LANCHOR0+2048@toc@l(9)
        lfd 11,.LANCHOR0@toc@l(8)
        addis 9,2,.LANCHOR0+3072@toc@ha
        addis 8,2,.LANCHOR0+1024@toc@ha
        lfd 12,.LANCHOR0+3072@toc@l(9)
        lfd 0,.LANCHOR0+1024@toc@l(8)
        addis 9,2,.LANCHOR0+131072@toc@ha
        addi 9,9,.LANCHOR0+131072@toc@l
        xxpermdi 12,10,12,0
        xxpermdi 0,11,0,0
        stxvd2x 12,9,10
        stxvd2x 0,0,9
        blr

both look comparatively ugly due to the loads of .LANCHOR uses.  I'd have
expected a lea of &a[0][0] and then offsetted addressing of that.  At least
it would avoid a ton of relocations.  Looks like 131072 wouldn't fit in
the 16 bits offset though.  Anyway - offtopic.  Whether the xxpermdi
makes it unprofitable to vectorize is not known to me.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug testsuite/96098] [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
  2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
  2020-07-08  7:10 ` [Bug testsuite/96098] " rguenth at gcc dot gnu.org
@ 2020-10-16 11:55 ` rguenth at gcc dot gnu.org
  2021-01-15 12:31 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-16 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug testsuite/96098] [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
  2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
  2020-07-08  7:10 ` [Bug testsuite/96098] " rguenth at gcc dot gnu.org
  2020-10-16 11:55 ` rguenth at gcc dot gnu.org
@ 2021-01-15 12:31 ` rguenth at gcc dot gnu.org
  2021-01-15 12:33 ` cvs-commit at gcc dot gnu.org
  2021-01-15 12:33 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-15 12:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcase morphed in a way no longer testing what it was originally supposed
to do and slightly altering it shows the original issue isn't fixed (anymore).
The limit as set as result of PR91403 (and dups) prevents the issue for larger
arrays but the testcase has

double a[128][128];

which results in a group size of "just" 512 (the limit is 4096).  Avoiding
the 'BB vectorization with gaps at the end of a load is not supported'
by altering it to do

void foo(void)
{
  b[0] = a[0][0];
  b[1] = a[1][0];
  b[2] = a[2][0];
  b[3] = a[3][127];
}

shows that costing has improved further to not account the dead loads making
the previous test inefficient.  In fact the underlying issue isn't fixed
(we do code-generate dead loads).

In fact the vector permute load is even profitable, just the excessive
code-generation issue exists (and is "fixed" by capping it a constant
boundary, just too high for this particular testcase).

The testcase now has "dups", so I'll simply remove it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug testsuite/96098] [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
  2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
                   ` (2 preceding siblings ...)
  2021-01-15 12:31 ` rguenth at gcc dot gnu.org
@ 2021-01-15 12:33 ` cvs-commit at gcc dot gnu.org
  2021-01-15 12:33 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-15 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:cb60334b7162ec5ae560be482cd7a33402470bb4

commit r11-6710-gcb60334b7162ec5ae560be482cd7a33402470bb4
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 15 13:31:28 2021 +0100

    testsuite/96098 - remove redundant testcase

    The testcase morphed in a way no longer testing what it was originally
supposed to do and slightly altering it shows the original issue isn't fixed
(anymore).
    The limit as set as result of PR91403 (and dups) prevents the issue for
larger
    arrays but the testcase has

    double a[128][128];

    which results in a group size of "just" 512 (the limit is 4096).  Avoiding
    the 'BB vectorization with gaps at the end of a load is not supported'
    by altering it to do

    void foo(void)
    {
      b[0] = a[0][0];
      b[1] = a[1][0];
      b[2] = a[2][0];
      b[3] = a[3][127];
    }

    shows that costing has improved further to not account the dead loads
making
    the previous test inefficient.  In fact the underlying issue isn't fixed
    (we do code-generate dead loads).

    In fact the vector permute load is even profitable, just the excessive
    code-generation issue exists (and is "fixed" by capping it a constant
    boundary, just too high for this particular testcase).

    The testcase now has "dups", so I'll simply remove it.

    2021-01-15  Richard Biener  <rguenther@suse.de>

            PR testsuite/96098
            * gcc.dg/vect/bb-slp-pr68892.c: Remove.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug testsuite/96098] [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205
  2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
                   ` (3 preceding siblings ...)
  2021-01-15 12:33 ` cvs-commit at gcc dot gnu.org
@ 2021-01-15 12:33 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-15 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96098

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-15 12:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 16:49 [Bug testsuite/96098] New: [11 regression] gcc.dg/vect/bb-slp-pr68892.c fails since r11-205 seurer at linux dot vnet.ibm.com
2020-07-08  7:10 ` [Bug testsuite/96098] " rguenth at gcc dot gnu.org
2020-10-16 11:55 ` rguenth at gcc dot gnu.org
2021-01-15 12:31 ` rguenth at gcc dot gnu.org
2021-01-15 12:33 ` cvs-commit at gcc dot gnu.org
2021-01-15 12:33 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).