public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/18767] New: No vectorization for simple loop
@ 2004-12-01 20:58 bangerth at dealii dot org
  2004-12-02 13:29 ` [Bug tree-optimization/18767] " pinskia at gcc dot gnu dot org
  2005-09-21  1:33 ` pinskia at gcc dot gnu dot org
  0 siblings, 2 replies; 5+ messages in thread
From: bangerth at dealii dot org @ 2004-12-01 20:58 UTC (permalink / raw)
  To: gcc-bugs

As observed in PR 17619: 
 
Take this code: 
--------------------------------  
struct X { float array[4]; };  
  
X a,b;  
  
float foobar () {  
  float s = 0;  
  for (unsigned int d=0; d<4; ++d)  
    s += a.array[d] * b.array[d];  
  return s;  
}  
--------------------------  
It compiles to 
-------------------------- 
        flds b+12 
        fmuls   a+12 
        movss   b, %xmm1 
        mulss   a, %xmm1 
        addss   .LC0, %xmm1 
        movss   b+4, %xmm0 
        mulss   a+4, %xmm0 
        addss   %xmm0, %xmm1 
        movss   b+8, %xmm0 
        mulss   a+8, %xmm0 
        addss   %xmm0, %xmm1 
        movss   %xmm1, -4(%ebp) 
        flds -4(%ebp) 
        faddp   %st, %st(1) 
-------------------------- 
 
However, what should really happen is that the compiler should vectorize 
the loop. As Uros points out in PR 17619, this isn't happening, although 
this modified function here 
-------------------------- 
struct X 
{ 
  float array[4]; 
}; 
 
float foobar() 
{ 
  X a, b, c; 
 
  float s = 0; 
  for (unsigned int d = 0; d < 4; ++d) 
    c.array[d] = a.array[d] * b.array[d]; 
 
  for (unsigned int d = 0; d < 4; ++d) 
    s += c.array[d]; 
 
  return s; 
} 
-------------------------- 
generates the optimal code: 
-------------------------- 
        movaps  32(%esp), %xmm0 
        mulps   16(%esp), %xmm0 
        movaps  %xmm0, (%esp) 
        flds    4(%esp) 
        fadds   (%esp) 
        fadds   8(%esp) 
        fadds   12(%esp) 
-------------------------- 
The compiler should be able to make this transformation by itself. 
 
Thanks 
 W. 
--------------------------

-- 
           Summary: No vectorization for simple loop
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bangerth at dealii dot org
                CC: gcc-bugs at gcc dot gnu dot org,uros at gcc dot gnu dot
                    org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/18767] No vectorization for simple loop
  2004-12-01 20:58 [Bug target/18767] New: No vectorization for simple loop bangerth at dealii dot org
@ 2004-12-02 13:29 ` pinskia at gcc dot gnu dot org
  2005-09-21  1:33 ` pinskia at gcc dot gnu dot org
  1 sibling, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-02 13:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-02 13:29 -------
Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
          Component|target                      |tree-optimization
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2004-12-02 13:29:13
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/18767] No vectorization for simple loop
  2004-12-01 20:58 [Bug target/18767] New: No vectorization for simple loop bangerth at dealii dot org
  2004-12-02 13:29 ` [Bug tree-optimization/18767] " pinskia at gcc dot gnu dot org
@ 2005-09-21  1:33 ` pinskia at gcc dot gnu dot org
  1 sibling, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-21  1:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-21 01:33 -------
With -ffast-math on the mainline, we get:
_Z6foobarv:
.LFB2:
        movaps  b(%rip), %xmm1
        mulps   a(%rip), %xmm1
        movaps  %xmm1, %xmm2
        movaps  %xmm1, %xmm0
        shufps  $85, %xmm1, %xmm2
        addss   %xmm2, %xmm0
        movaps  %xmm1, %xmm2
        unpckhps        %xmm1, %xmm2
        shufps  $255, %xmm1, %xmm1
        addss   %xmm2, %xmm0
        addss   %xmm1, %xmm0
        ret



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/18767] No vectorization for simple loop
       [not found] <bug-18767-4@http.gcc.gnu.org/bugzilla/>
  2011-05-22 16:05 ` steven at gcc dot gnu.org
@ 2011-05-22 21:50 ` ubizjak at gmail dot com
  1 sibling, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2011-05-22 21:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2011-05-22 21:25:27 UTC ---
(In reply to comment #3)
> Still not vectorized in recent GCC:
> 
> t.c:7: note: not vectorized: unsupported use in stmt.

pr18767.c:10: note: reduction: unsafe fp math optimization: s_8 = D.2696_7 +
s_16;

-ffast-math solves this.

foobar:
.LFB0:
    .cfi_startproc
    vmovaps    b(%rip), %xmm0
    vmulps    a(%rip), %xmm0, %xmm0
    vhaddps    %xmm0, %xmm0, %xmm0
    vhaddps    %xmm0, %xmm0, %xmm0
    ret


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/18767] No vectorization for simple loop
       [not found] <bug-18767-4@http.gcc.gnu.org/bugzilla/>
@ 2011-05-22 16:05 ` steven at gcc dot gnu.org
  2011-05-22 21:50 ` ubizjak at gmail dot com
  1 sibling, 0 replies; 5+ messages in thread
From: steven at gcc dot gnu.org @ 2011-05-22 16:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm.com

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2011-05-22 15:47:24 UTC ---
Still not vectorized in recent GCC:

t.c:7: note: not vectorized: unsupported use in stmt.

     1    typedef struct { float array[4]; } X;  
     2     
     3    X a,b;  
     4     
     5    float foobar () {  
     6        float s = 0;  
     7        for (unsigned int d=0; d<4; ++d)  
     8          s += a.array[d] * b.array[d];  
     9        return s;  
    10    }  
    11    

"GCC: (GNU) 4.6.0 20110312 (experimental) [trunk revision 170907]"


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-05-22 21:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-01 20:58 [Bug target/18767] New: No vectorization for simple loop bangerth at dealii dot org
2004-12-02 13:29 ` [Bug tree-optimization/18767] " pinskia at gcc dot gnu dot org
2005-09-21  1:33 ` pinskia at gcc dot gnu dot org
     [not found] <bug-18767-4@http.gcc.gnu.org/bugzilla/>
2011-05-22 16:05 ` steven at gcc dot gnu.org
2011-05-22 21:50 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).