[Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/40648]  New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
@ 2009-07-04 11:54 ubizjak at gmail dot com
  2009-07-04 12:06 ` [Bug target/40648] " rguenth at gcc dot gnu dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-04 11:54 UTC (permalink / raw)
  To: gcc-bugs

Hello!

The "[patch, vectorizer] misaligned store support" patch [1] resulted in more
than 10% longer execution time for Polyhedron test_fpu test on Core2.

The test is compiled with "-march=x86-64 -ffast-math -funroll-loops -O3",
results are:

time ./a.out
  Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts  2.5 sec  Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts  2.5 sec  Err= 0.000000000000015
Test3 - Crout  2 (1001x1001) inverts  2.3 sec  Err= 0.000000000000065
Test4 - Lapack 2 (1001x1001) inverts  2.4 sec  Err= 0.000000000000250
                             total =  9.6 sec


real    0m9.864s
user    0m9.778s
sys     0m0.074s

with patch [1] included, vs.:

time ./a.out
  Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts  1.9 sec  Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts  2.5 sec  Err= 0.000000000000015
Test3 - Crout  2 (1001x1001) inverts  2.3 sec  Err= 0.000000000000065
Test4 - Lapack 2 (1001x1001) inverts  2.0 sec  Err= 0.000000000000250
                             total =  8.6 sec


real    0m8.869s
user    0m8.788s
sys     0m0.068s

when patch [1] is reverted.

The compiler is from today's SVN, "xgcc (GCC) 4.5.0 20090704 (experimental)
[trunk revision 149223]".

The effect of this patch can also be seen on [2], see test_fpu chart between
2009-06-05 and 2009-06-06.

[1] http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00492.html
[2] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html


-- 
           Summary: misaligned store vectorizer patch introduced 10% runtime
                    regression on Polyhedron test_fpu
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ubizjak at gmail dot com
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
@ 2009-07-04 12:06 ` rguenth at gcc dot gnu dot org
  2009-07-04 12:33 ` rguenth at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-04 12:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-07-04 12:05 -------
Can you check numbers with vectorization disabled?  I see the regression as
well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
as aligned moves (if the data turns out to be aligned).  Which means the
data really is unaligned.

What is the difference in code generation?  Can you create a testcase from
the hot loop(s)?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
  2009-07-04 12:06 ` [Bug target/40648] " rguenth at gcc dot gnu dot org
@ 2009-07-04 12:33 ` rguenth at gcc dot gnu dot org
  2009-07-04 12:36 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-04 12:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-07-04 12:33 -------
One loop is

   139  0.0046 :               DO l = 1 , K
   622  0.0208 :                  IF ( B(l,j)/=ZERO ) THEN
               :                     temp = Alpha*B(l,j)
 21380  0.7146 :                     DO i = 1 , M
569348 19.0299 :                        C(i,j) = C(i,j) + temp*A(i,l)
               :                     ENDDO
               :                  ENDIF
               :               ENDDO

where C(i,j) and A(i,l) are all unaligned.  As the number of iterations is
symbolic we use epilogue peeling.  But instead we could have done
peeling to align the loads/stores from/to C.


      SUBROUTINE DGEMM(M,N,K,Alpha,A,Lda,B,Ldb,Beta,C,Ldc)
      IMPLICIT NONE
      DOUBLE PRECISION , PARAMETER :: ONE = 1.0D+0 , ZERO = 0.0D+0
      DOUBLE PRECISION :: Alpha , Beta
      INTEGER :: K , Lda , Ldb , Ldc , M , N
      DOUBLE PRECISION , DIMENSION(Lda,*) :: A
      DOUBLE PRECISION , DIMENSION(Ldb,*) :: B
      DOUBLE PRECISION , DIMENSION(Ldc,*) :: C
      INTENT (IN) A , Alpha , B , Beta , K , Lda , Ldb , Ldc , M , N
      INTENT (INOUT) C
      INTEGER :: i , j , l
      DOUBLE PRECISION :: temp
            DO j = 1 , N
               IF ( Beta==ZERO ) THEN
                  DO i = 1 , M
                     C(i,j) = ZERO
                  ENDDO
               ELSEIF ( Beta/=ONE ) THEN
                  DO i = 1 , M
                     C(i,j) = Beta*C(i,j)
                  ENDDO
               ENDIF
               DO l = 1 , K
                  IF ( B(l,j)/=ZERO ) THEN
                     temp = Alpha*B(l,j)
                     DO i = 1 , M
                        C(i,j) = C(i,j) + temp*A(i,l)
                     ENDDO
                  ENDIF
               ENDDO
            ENDDO
      END SUBROUTINE DGEMM


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-07-04 12:33:20
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
  2009-07-04 12:06 ` [Bug target/40648] " rguenth at gcc dot gnu dot org
  2009-07-04 12:33 ` rguenth at gcc dot gnu dot org
@ 2009-07-04 12:36 ` rguenth at gcc dot gnu dot org
  2009-07-04 12:44 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-04 12:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2009-07-04 12:36 -------
Tuned for Core2 I get for the innermost loop

.L19:
        leal    (%eax,%ebx), %edx
        movsd   (%eax,%ecx), %xmm1
        movsd   (%edx), %xmm7
        movhpd  8(%eax,%ecx), %xmm1
        movhpd  8(%edx), %xmm7
        movapd  %xmm1, %xmm0
        incl    %esi
        mulpd   %xmm3, %xmm0
        addl    $16, %eax
        addpd   %xmm7, %xmm0
        cmpl    %edi, %esi
        movlpd  %xmm0, (%edx)
        movhpd  %xmm0, 8(%edx)
        jb      .L19

which is slower than with vectorization disabled (which is what happened
before the patch?).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2009-07-04 12:36 ` rguenth at gcc dot gnu dot org
@ 2009-07-04 12:44 ` ubizjak at gmail dot com
  2009-07-04 13:40 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-04 12:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from ubizjak at gmail dot com  2009-07-04 12:43 -------
(In reply to comment #1)
> Can you check numbers with vectorization disabled?  I see the regression as
> well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
> as aligned moves (if the data turns out to be aligned).  Which means the
> data really is unaligned.

Without the vectorisation (adding -fno-tree-vectorize), there is no difference:

time ./a.out
  Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts  2.2 sec  Err= 0.000000000000006
Test2 - Crout 2000 (101x101) inverts  2.5 sec  Err= 0.000000000000023
Test3 - Crout  2 (1001x1001) inverts  2.6 sec  Err= 0.000000000000031
Test4 - Lapack 2 (1001x1001) inverts  2.2 sec  Err= 0.000000000000250
                             total =  9.5 sec


real    0m9.760s
user    0m9.674s
sys     0m0.082s


> What is the difference in code generation?  Can you create a testcase from
> the hot loop(s)?


The first thing to catch my eye is missing CSE on memory references in the
_MAIN loop (added -march=barcelona to generate movupd insns:


.L14:
        movupd  equiv.0.1551(%rdx,%rax), %xmm8  #, tmp720
        movupd  a3.1557(%rdx,%rax), %xmm7       #, tmp722
        leaq    16(%rdx), %r10  #, tmp789
        movupd  equiv.0.1551(%r10,%rax), %xmm6  #, tmp867
        movupd  a3.1557(%r10,%rax), %xmm5       #, tmp869
        leaq    32(%rdx), %rcx  #, ivtmp.845
        subpd   %xmm8, %xmm7    # tmp720, tmp722
        movupd  equiv.0.1551(%rcx,%rax), %xmm3  #, tmp873
        movupd  a3.1557(%rcx,%rax), %xmm4       #, tmp875
        subpd   %xmm6, %xmm5    # tmp867, tmp869
        leaq    48(%rdx), %r9   #, ivtmp.845
        leaq    64(%rdx), %r8   #, ivtmp.845
        subpd   %xmm3, %xmm4    # tmp873, tmp875
        movupd  a3.1557(%r9,%rax), %xmm15       #, tmp881
        movupd  equiv.0.1551(%r9,%rax), %xmm2   #, tmp879
        movupd  a3.1557(%r8,%rax), %xmm13       #, tmp887
        movupd  equiv.0.1551(%r8,%rax), %xmm14  #, tmp885

and in regressed case:

.L13:
        movupd  (%rdx,%rax), %xmm15     #* ivtmp.792, tmp962
        movupd  (%r12,%rax), %xmm14     #* ivtmp.792, tmp964
        leaq    16(%rax), %rsi  #, tmp1050
        movupd  (%rdx,%rsi), %xmm13     #, tmp1114
        movupd  (%r12,%rsi), %xmm12     #, tmp1116
        leaq    32(%rax), %r15  #, ivtmp.792
        subpd   %xmm15, %xmm14  # tmp962, tmp964
        movupd  (%rdx,%r15), %xmm11     #* ivtmp.792, tmp1120
        movupd  (%r12,%r15), %xmm10     #* ivtmp.792, tmp1122
        subpd   %xmm13, %xmm12  # tmp1114, tmp1116
        leaq    48(%rax), %rcx  #, ivtmp.792
        leaq    64(%rax), %r13  #, ivtmp.792
        subpd   %xmm11, %xmm10  # tmp1120, tmp1122
        movupd  (%r12,%rcx), %xmm8      #* ivtmp.792, tmp1128
        movupd  (%rdx,%rcx), %xmm9      #* ivtmp.792, tmp1126
        movupd  (%r12,%r13), %xmm6      #* ivtmp.792, tmp1134
        movupd  (%rdx,%r13), %xmm7      #* ivtmp.792, tmp1132


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2009-07-04 12:44 ` ubizjak at gmail dot com
@ 2009-07-04 13:40 ` ubizjak at gmail dot com
  2009-07-04 14:02 ` dominiq at lps dot ens dot fr
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-04 13:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2009-07-04 13:40 -------
(In reply to comment #4)

> and in regressed case:

... in NON-regressed case. The regressed code is the first dump.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2009-07-04 13:40 ` ubizjak at gmail dot com
@ 2009-07-04 14:02 ` dominiq at lps dot ens dot fr
  2009-07-05  8:12 ` eres at il dot ibm dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-07-04 14:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from dominiq at lps dot ens dot fr  2009-07-04 14:02 -------
I have seen this problem also. From a crude profiling, it seems that the slow
routines are dgemm as pointed in comment #2 and gauss. This is a regression
with respect to 4.4.0 and it has started between June 5 and 6.

On i686-apple-darwin9 with "-ffast-math -funroll-loops -O3" but without
specifying -march, the inner loop is unrolled 8 times with 4.4 and only 4 times
with trunk with a lot of extra code for the memory access (I am not fluent with
the *86 assembly). I have compared the outputs of -ftree-vectorizer-verbose=2,
but did not see anything suspicious.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2009-07-04 14:02 ` dominiq at lps dot ens dot fr
@ 2009-07-05  8:12 ` eres at il dot ibm dot com
  2009-07-07 15:48 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: eres at il dot ibm dot com @ 2009-07-05  8:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from eres at il dot ibm dot com  2009-07-05 08:12 -------
Testing test_fpu on Power7 with the power7 branch shows no significant
difference between the version compiled with the misaligned store support patch
and without it. (using -mcpu=power7 -ffast-math -funroll-loops -O3)
The version with the misaligned store support patch is ~23% faster than the
version with -fno-tree-vectorize.
So it seems like this is a tuning issue for x86-64 and might be addressed in
the cost model.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2009-07-05  8:12 ` eres at il dot ibm dot com
@ 2009-07-07 15:48 ` rguenth at gcc dot gnu dot org
  2009-07-09  7:33 ` eres at il dot ibm dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-07 15:48 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #8 from rguenth at gcc dot gnu dot org  2009-07-07 15:47 -------
The issue is likely the sequence

  load upper half of cache line 1
  load lower half of cache line 2
  store upper half of cache line 1
  store lower half of cache line 2   <---
  load upper half of cache line 2    <---
  load lower half of cache line 3
   ...

where the marked lines probably cause internal delays.

Not using unaligned stores for this kind of data dependence or peeling
for alignment will probably help here.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2009-07-07 15:48 ` rguenth at gcc dot gnu dot org
@ 2009-07-09  7:33 ` eres at il dot ibm dot com
  2009-10-25 12:41 ` eres at il dot ibm dot com
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: eres at il dot ibm dot com @ 2009-07-09  7:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from eres at il dot ibm dot com  2009-07-09 07:32 -------
> Not using unaligned stores for this kind of data dependence or peeling
> for alignment will probably help here.

The decision of how to vectorized can be changed for x86 (or any other target).
Instead of first checking for misalign support in
vect_enhance_data_refs_alignment () function (tree-vect-data-refs.c)
peeling and versioning can be checked first...
(Please see also the comment at the beginning of this function)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2009-07-09  7:33 ` eres at il dot ibm dot com
@ 2009-10-25 12:41 ` eres at il dot ibm dot com
  2009-10-28 10:33 ` ubizjak at gmail dot com
  2009-10-28 10:37 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: eres at il dot ibm dot com @ 2009-10-25 12:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from eres at il dot ibm dot com  2009-10-25 12:41 -------
(In reply to comment #0)
> Hello!
> The "[patch, vectorizer] misaligned store support" patch [1] resulted in more
> than 10% longer execution time for Polyhedron test_fpu test on Core2.
> The test is compiled with "-march=x86-64 -ffast-math -funroll-loops -O3",

This patch changes wrongly the decision of when to peel the loop which causes
the negative effect on the performance.
I am preparing a patch to fix this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2009-10-25 12:41 ` eres at il dot ibm dot com
@ 2009-10-28 10:33 ` ubizjak at gmail dot com
  2009-10-28 10:37 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-10-28 10:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from ubizjak at gmail dot com  2009-10-28 10:33 -------
Author: revitale
Date: Tue Oct 27 11:46:07 2009
New Revision: 153590

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153590
Log:
Fix PR40648 -- Fix misaligned store vectorizer patch

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
   
trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
   
trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
    trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-25.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-109.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-26.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-27.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-28.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-29.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-33.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-44.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-48.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-50.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-52.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-54.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-56.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-58.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-60.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-70.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-72.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-87.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-88.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-89.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-91.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-92.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-93.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-95.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-96.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-2.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-3.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-4.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-5.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-6.c
    trunk/gcc/testsuite/gcc.target/powerpc/vsx-vectorize-7.c
    trunk/gcc/testsuite/gfortran.dg/vect/vect-2.f90
    trunk/gcc/testsuite/gfortran.dg/vect/vect-3.f90
    trunk/gcc/testsuite/gfortran.dg/vect/vect-4.f90
    trunk/gcc/testsuite/gfortran.dg/vect/vect-5.f90
    trunk/gcc/tree-vect-data-refs.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu
  2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2009-10-28 10:33 ` ubizjak at gmail dot com
@ 2009-10-28 10:37 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-10-28 10:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from ubizjak at gmail dot com  2009-10-28 10:36 -------
The patch fixed the regression, see test_fpu chart [1] between
2009-10-27 and 2009-10-28.

[1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |patches/2009-
                   |                            |10/msg01604.html
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-10-28 10:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-04 11:54 [Bug target/40648] New: misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu ubizjak at gmail dot com
2009-07-04 12:06 ` [Bug target/40648] " rguenth at gcc dot gnu dot org
2009-07-04 12:33 ` rguenth at gcc dot gnu dot org
2009-07-04 12:36 ` rguenth at gcc dot gnu dot org
2009-07-04 12:44 ` ubizjak at gmail dot com
2009-07-04 13:40 ` ubizjak at gmail dot com
2009-07-04 14:02 ` dominiq at lps dot ens dot fr
2009-07-05  8:12 ` eres at il dot ibm dot com
2009-07-07 15:48 ` rguenth at gcc dot gnu dot org
2009-07-09  7:33 ` eres at il dot ibm dot com
2009-10-25 12:41 ` eres at il dot ibm dot com
2009-10-28 10:33 ` ubizjak at gmail dot com
2009-10-28 10:37 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).