public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/38968]  New: Complex matrix product is not vectorized
@ 2009-01-25 16:52 dominiq at lps dot ens dot fr
  2009-01-25 17:33 ` [Bug tree-optimization/38968] " rguenth at gcc dot gnu dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-01-25 16:52 UTC (permalink / raw)
  To: gcc-bugs

As shown by the following code, the complex matrix product is not vectorized,
even with the patch in http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01174.html:

program mymatmul
  implicit none
  integer, parameter :: kp = 4
  integer, parameter :: n = 2000
  real(kp), dimension(n,n) :: rr, ri
  complex(kp), dimension(n,n) :: a,b,c
  real :: t1, t2
  integer :: i, j, k

  call random_number (rr)
  call random_number (ri)
  a = cmplx (rr, ri)
  call random_number (rr)
  call random_number (ri)
  b = cmplx (rr, ri)

  call cpu_time (t1)

  c = cmplx (0._kp, 0._kp)
  do j = 1, n
     do k = 1, n
        do i = 1, n
           c(i,j) = c(i,j) + a(i,k) * b(k,j)
        end do
     end do
  end do

  call cpu_time (t2)
  write (*,'(F8.4)') t2-t1

end program mymatmul

[ibook-dhum] bug/timing% gfc -m64 -O3 -ffast-math -funroll-loops
-fomit-frame-pointer -ftree-vectorizer-verbose=2 mymatmul_v_c4.f90              
mymatmul_v_c4.f90:22: note: not vectorized: can't calculate alignment for data
ref.
mymatmul_v_c4.f90:15: note: not vectorized: complicated access pattern.
mymatmul_v_c4.f90:15: note: not vectorized: can't calculate alignment for data
ref.
mymatmul_v_c4.f90:12: note: not vectorized: complicated access pattern.
mymatmul_v_c4.f90:12: note: not vectorized: can't calculate alignment for data
ref.
mymatmul_v_c4.f90:1: note: vectorized 0 loops in function.

The real corresponding code is vectorized.


-- 
           Summary: Complex matrix product is not vectorized
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
 GCC build triplet: i686-apple-darwin9
  GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
@ 2009-01-25 17:33 ` rguenth at gcc dot gnu dot org
  2009-01-26 11:15 ` rguenth at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-25 17:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-01-25 17:33 -------
Confirmed.  Note the patch mentioned does not try to address any issue present
in the testcase.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
          Component|middle-end                  |tree-optimization
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2009-01-25 17:33:10
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
  2009-01-25 17:33 ` [Bug tree-optimization/38968] " rguenth at gcc dot gnu dot org
@ 2009-01-26 11:15 ` rguenth at gcc dot gnu dot org
  2009-01-26 13:10 ` irar at il dot ibm dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-26 11:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-01-26 11:15 -------
This happens because ivcanon introduces an induction variable that counts
from 2000 to 1.  This "confuses" data-ref analysis and we get

        base_address: a_24(D)
        offset from base address: (<unnamed-signed:64>)
((<unnamed-unsigned:64>) pretmp.28_148 * 16000)
        constant offset from base address: -15996
        step: 8
        aligned to: 128
        base_object: IMAGPART_EXPR <(*a_24(D))[0]>
        symbol tag: SMT.12

notice the negative constant offset from base address.  This in turn
confuses the vectorizer alignment analysis - but only because the alignment
of the base object is known.  We hit (with misalign == -15996, alignment == 16)

  /* Modulo alignment.  */
  misalign = size_binop (TRUNC_MOD_EXPR, misalign, alignment);

  if (!host_integerp (misalign, 1))
    {
      /* Negative or overflowed misalignment value.  */
      if (vect_print_dump_info (REPORT_DETAILS))
        fprintf (vect_dump, "unexpected misalign value");
      return false;
    }

and the modulo is -12.

Now, I wonder why we do not just use alignment + misalign in that case.

I have a patch.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rguenth at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2009-01-25 17:33:10         |2009-01-26 11:15:23
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
  2009-01-25 17:33 ` [Bug tree-optimization/38968] " rguenth at gcc dot gnu dot org
  2009-01-26 11:15 ` rguenth at gcc dot gnu dot org
@ 2009-01-26 13:10 ` irar at il dot ibm dot com
  2009-01-26 13:25 ` rguenth at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: irar at il dot ibm dot com @ 2009-01-26 13:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from irar at il dot ibm dot com  2009-01-26 13:09 -------
(In reply to comment #2)
> Now, I wonder why we do not just use alignment + misalign in that case.

I think you are right.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2009-01-26 13:10 ` irar at il dot ibm dot com
@ 2009-01-26 13:25 ` rguenth at gcc dot gnu dot org
  2009-01-26 14:21 ` howarth at nitro dot med dot uc dot edu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-26 13:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2009-01-26 13:25 -------
Patch posted.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |patches/2009-
                   |                            |01/msg01271.html


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2009-01-26 13:25 ` rguenth at gcc dot gnu dot org
@ 2009-01-26 14:21 ` howarth at nitro dot med dot uc dot edu
  2009-01-26 14:23 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-01-26 14:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from howarth at nitro dot med dot uc dot edu  2009-01-26 14:21 -------
Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2009-01-26 14:21 ` howarth at nitro dot med dot uc dot edu
@ 2009-01-26 14:23 ` rguenther at suse dot de
  2009-02-01 10:37 ` dominiq at lps dot ens dot fr
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2009-01-26 14:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenther at suse dot de  2009-01-26 14:23 -------
Subject: Re:  Complex matrix product is not
 vectorized

On Mon, 26 Jan 2009, howarth at nitro dot med dot uc dot edu wrote:

> ------- Comment #5 from howarth at nitro dot med dot uc dot edu  2009-01-26 14:21 -------
> Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1?

stage1, it is an enhancement.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2009-01-26 14:23 ` rguenther at suse dot de
@ 2009-02-01 10:37 ` dominiq at lps dot ens dot fr
  2009-02-01 10:49 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-02-01 10:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from dominiq at lps dot ens dot fr  2009-02-01 10:37 -------
Created an attachment (id=17220)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17220&action=view)
testin complex matrix multiplication

Comment #0 is not fully accurate. With some more testsing with the 
attached code, I get:
- gcc 4.3.3: no vectorization,
- gcc 4.4.0 (trunk) : vectorization for odd n,
- gcc 4.4.0 + patch from 
  http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01271.html:
  vectorization for all values of n (in the tested range).

The attached code also checked the result of the matrix product which is
OK. Now as shown below (in flops/clock cycle), the timings are quite
disapointing (-m64 -O3 -ffast-math -funroll-loops): for odd n, the
vectorized code is slower than the nonvectorized one, for even n, the code
is faster with vectorization, but still significantly slower than with
ifort.

 n     4.3.3       trunk       trunk      ifort
                              +patch       11.0

124     1.33        1.36        1.81        2.61
125     1.37        1.32        1.32        2.20
126     1.36        1.37        1.79        2.55
127     1.37        1.31        1.31        2.22
128     1.38        1.39        1.86        2.64


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2009-02-01 10:37 ` dominiq at lps dot ens dot fr
@ 2009-02-01 10:49 ` rguenth at gcc dot gnu dot org
  2009-02-01 10:58 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-02-01 10:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2009-02-01 10:49 -------
This is somewhat expected.  We vectorize the complex product using vectors
of real parts and vectors of complex parts of two complex numbers (so we
are not using the fancy haddsub SSE codes).  Did you try enabling SSE3 btw?
Can you post the ifort assembly of the loop?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (7 preceding siblings ...)
  2009-02-01 10:49 ` rguenth at gcc dot gnu dot org
@ 2009-02-01 10:58 ` dominiq at lps dot ens dot fr
  2009-02-01 11:11 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-02-01 10:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from dominiq at lps dot ens dot fr  2009-02-01 10:58 -------
> Did you try enabling SSE3 btw?

No. How do I get the enabled SSE* by default?

> Can you post the ifort assembly of the loop?

L_B1.14:                        # Preds L_B1.14 L_B1.13
        lea       (%rsi,%r9,8), %r11                            #
        lea       mymatmul_$A.0.1(%rip), %r10                   #27.33
        movaps    (%r10,%r11), %xmm2                            #27.33
        movaps    16(%r10,%r11), %xmm4                          #27.33
        movaps    %xmm0, %xmm3                                  #27.40
        mulps     %xmm2, %xmm3                                  #27.40
        shufps    $177, %xmm2, %xmm2                            #27.40
        lea       (%rdx,%r9,8), %r15                            #
        lea       mymatmul_$C.0.1(%rip), %r14                   #27.24
        movaps    %xmm0, %xmm5                                  #27.40
        addq      $4, %r9                                       #26.12
        mulps     %xmm1, %xmm2                                  #27.40
        cmpq      $128, %r9                                     #26.12
        addsubps  %xmm2, %xmm3                                  #27.40
        addps     (%r14,%r15), %xmm3                            #27.15
        movaps    %xmm3, (%r14,%r15)                            #27.15
        mulps     %xmm4, %xmm5                                  #27.40
        shufps    $177, %xmm4, %xmm4                            #27.40
        mulps     %xmm1, %xmm4                                  #27.40
        addsubps  %xmm4, %xmm5                                  #27.40
        addps     16(%r14,%r15), %xmm5                          #27.15
        movaps    %xmm5, 16(%r14,%r15)                          #27.15
        jl        L_B1.14       # Prob 99%                      #26.12


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (8 preceding siblings ...)
  2009-02-01 10:58 ` dominiq at lps dot ens dot fr
@ 2009-02-01 11:11 ` rguenther at suse dot de
  2009-03-28 10:05 ` rguenth at gcc dot gnu dot org
  2009-03-28 10:06 ` rguenth at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2009-02-01 11:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenther at suse dot de  2009-02-01 11:11 -------
Subject: Re:  Complex matrix product is not
 vectorized

On Sun, 1 Feb 2009, dominiq at lps dot ens dot fr wrote:

> ------- Comment #9 from dominiq at lps dot ens dot fr  2009-02-01 10:58 -------
> > Did you try enabling SSE3 btw?
> 
> No. How do I get the enabled SSE* by default?

You can enable SSE3 manually with -msse3, or automatically enable what
your local CPU can do with -march=native.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (9 preceding siblings ...)
  2009-02-01 11:11 ` rguenther at suse dot de
@ 2009-03-28 10:05 ` rguenth at gcc dot gnu dot org
  2009-03-28 10:06 ` rguenth at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-28 10:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from rguenth at gcc dot gnu dot org  2009-03-28 10:05 -------
Subject: Bug 38968

Author: rguenth
Date: Sat Mar 28 10:05:24 2009
New Revision: 145171

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145171
Log:
2009-03-28  Richard Guenther  <rguenther@suse.de>

        PR tree-optimization/38968
        * tree-vect-analyze.c (vect_compute_data_ref_alignment):
        Use FLOOR_MOD_EXPR to compute misalignment.

        * gfortran.dg/vect/fast-math-pr38968.f90: New testcase.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr38968.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-analyze.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/38968] Complex matrix product is not vectorized
  2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
                   ` (10 preceding siblings ...)
  2009-03-28 10:05 ` rguenth at gcc dot gnu dot org
@ 2009-03-28 10:06 ` rguenth at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-03-28 10:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from rguenth at gcc dot gnu dot org  2009-03-28 10:06 -------
Fixed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-28 10:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-25 16:52 [Bug middle-end/38968] New: Complex matrix product is not vectorized dominiq at lps dot ens dot fr
2009-01-25 17:33 ` [Bug tree-optimization/38968] " rguenth at gcc dot gnu dot org
2009-01-26 11:15 ` rguenth at gcc dot gnu dot org
2009-01-26 13:10 ` irar at il dot ibm dot com
2009-01-26 13:25 ` rguenth at gcc dot gnu dot org
2009-01-26 14:21 ` howarth at nitro dot med dot uc dot edu
2009-01-26 14:23 ` rguenther at suse dot de
2009-02-01 10:37 ` dominiq at lps dot ens dot fr
2009-02-01 10:49 ` rguenth at gcc dot gnu dot org
2009-02-01 10:58 ` dominiq at lps dot ens dot fr
2009-02-01 11:11 ` rguenther at suse dot de
2009-03-28 10:05 ` rguenth at gcc dot gnu dot org
2009-03-28 10:06 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).