[Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/41464]  New: vector loads are unnecessarily split into high and low loads
@ 2009-09-24 23:14 nmiell at comcast dot net
  2009-09-25  9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: nmiell at comcast dot net @ 2009-09-24 23:14 UTC (permalink / raw)
  To: gcc-bugs

gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)

The testcase (built with -Wall -O3):

#include <math.h>

void MulPi(float * __attribute__((aligned(16))) i, float *
__attribute__((aligned(16))) f, int n)
{
        for (int j = 0; j < n; j++)
                f[j] = (float) M_PI * i[j];
}

produces the following for the vectorized version of the loop:

.L7:
        movaps  %xmm1, %xmm0            # zero XMM0
        incl    %ecx                    
        movlps  (%rdi,%rax), %xmm0      # load the low half into XMM0
        movhps  8(%rdi,%rax), %xmm0     # load the high half into XMM0
        mulps   %xmm2, %xmm0            # multiply by pi
        movaps  %xmm0, (%rsi,%rax)      # store to memory
        addq    $16, %rax
        cmpl    %r8d, %ecx
        jb      .L7


-- 
           Summary: vector loads are unnecessarily split into high and low
                    loads
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: nmiell at comcast dot net
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
@ 2009-09-25  9:06 ` rguenth at gcc dot gnu dot org
  2009-09-25 17:12 ` nmiell at comcast dot net
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-25  9:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-09-25 09:06 -------
The interesting thing is that data-ref analysis sees 128bit alignment but
the vectorizer still produces

  vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
  D.2564_12 = *D.2563_11;
  vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
  D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
  M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;

thus, unknown misalignment.

(instantiate_scev
  (instantiate_below = 3)
  (evolution_loop = 1)
  (chrec = {i_10(D), +, 4}_1)
  (res = {i_10(D), +, 4}_1))
        base_address: i_10(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *i_10(D)
Creating dr for *D.2562_7
  (res = {f_6(D), +, 4}_1))
        base_address: f_6(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *f_6(D)


t2.i:5: note: === vect_enhance_data_refs_alignment ===
t2.i:5: note: Vectorizing an unaligned access.
t2.i:5: note: Vectorizing an unaligned access.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at gcc dot gnu dot org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2009-09-25 09:06:12
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
  2009-09-25  9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
@ 2009-09-25 17:12 ` nmiell at comcast dot net
  2009-09-25 17:34 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: nmiell at comcast dot net @ 2009-09-25 17:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from nmiell at comcast dot net  2009-09-25 17:12 -------
Even if it thinks the arrays aren't aligned, that doesn't explain the
completely unnecessarily zeroing of XMM0 or the choice of the load high/low
instructions over MOVUPS.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
  2009-09-25  9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
  2009-09-25 17:12 ` nmiell at comcast dot net
@ 2009-09-25 17:34 ` ubizjak at gmail dot com
  2009-09-27  8:06 ` irar at il dot ibm dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2009-09-25 17:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from ubizjak at gmail dot com  2009-09-25 17:33 -------
(In reply to comment #2)
> Even if it thinks the arrays aren't aligned, that doesn't explain the
> completely unnecessarily zeroing of XMM0 or the choice of the load high/low
> instructions over MOVUPS.

This is by design, see config/i386/i386.c, ix86_expand_vector_move_misalign and
the comment above this function.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (2 preceding siblings ...)
  2009-09-25 17:34 ` ubizjak at gmail dot com
@ 2009-09-27  8:06 ` irar at il dot ibm dot com
  2009-09-27  9:43 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-09-27  8:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from irar at il dot ibm dot com  2009-09-27 08:06 -------
(In reply to comment #1)
> The interesting thing is that data-ref analysis sees 128bit alignment but
> the vectorizer still produces
>   vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
>   D.2564_12 = *D.2563_11;
>   vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
>   D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
>   M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
> thus, unknown misalignment.
> (instantiate_scev
>   (instantiate_below = 3)
>   (evolution_loop = 1)
>   (chrec = {i_10(D), +, 4}_1)
>   (res = {i_10(D), +, 4}_1))
>         base_address: i_10(D)
>         offset from base address: 0
>         constant offset from base address: 0
>         step: 4
>         aligned to: 128
>         base_object: *i_10(D)
> Creating dr for *D.2562_7
>   (res = {f_6(D), +, 4}_1))
>         base_address: f_6(D)
>         offset from base address: 0
>         constant offset from base address: 0
>         step: 4
>         aligned to: 128
>         base_object: *f_6(D)
> t2.i:5: note: === vect_enhance_data_refs_alignment ===
> t2.i:5: note: Vectorizing an unaligned access.
> t2.i:5: note: Vectorizing an unaligned access.

"aligned to" refers to the offset misalignment and not to the misalignment of
base.
attribute aligned works only for arrays, i.e., declarations, and not for
pointer arguments. For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
the base type.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (3 preceding siblings ...)
  2009-09-27  8:06 ` irar at il dot ibm dot com
@ 2009-09-27  9:43 ` rguenther at suse dot de
  2009-09-27  9:56 ` irar at il dot ibm dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2009-09-27  9:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenther at suse dot de  2009-09-27 09:43 -------
Subject: Re:  vector loads are unnecessarily
 split into high and low loads

On Sun, 27 Sep 2009, irar at il dot ibm dot com wrote:

> ------- Comment #4 from irar at il dot ibm dot com  2009-09-27 08:06 -------
> (In reply to comment #1)
> > The interesting thing is that data-ref analysis sees 128bit alignment but
> > the vectorizer still produces
> >   vect_var_.24_59 = M*vect_p.20_57{misalignment: 0};
> >   D.2564_12 = *D.2563_11;
> >   vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60;
> >   D.2565_13 = D.2564_12 * 2.2999999523162841796875e+0;
> >   M*vect_p.27_64{misalignment: 0} = vect_var_.25_61;
> > thus, unknown misalignment.
> > (instantiate_scev
> >   (instantiate_below = 3)
> >   (evolution_loop = 1)
> >   (chrec = {i_10(D), +, 4}_1)
> >   (res = {i_10(D), +, 4}_1))
> >         base_address: i_10(D)
> >         offset from base address: 0
> >         constant offset from base address: 0
> >         step: 4
> >         aligned to: 128
> >         base_object: *i_10(D)
> > Creating dr for *D.2562_7
> >   (res = {f_6(D), +, 4}_1))
> >         base_address: f_6(D)
> >         offset from base address: 0
> >         constant offset from base address: 0
> >         step: 4
> >         aligned to: 128
> >         base_object: *f_6(D)
> > t2.i:5: note: === vect_enhance_data_refs_alignment ===
> > t2.i:5: note: Vectorizing an unaligned access.
> > t2.i:5: note: Vectorizing an unaligned access.
> 
> "aligned to" refers to the offset misalignment and not to the misalignment of
> base.

Hmm, I believe it refers to base + offset + constant offset.

> attribute aligned works only for arrays, i.e., declarations, and not for
> pointer arguments.

I have to check that - I believe that in principle it should work.

> For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
> the base type.

That should be ok.  I guess I have to see what's going on here.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (4 preceding siblings ...)
  2009-09-27  9:43 ` rguenther at suse dot de
@ 2009-09-27  9:56 ` irar at il dot ibm dot com
  2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: irar at il dot ibm dot com @ 2009-09-27  9:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from irar at il dot ibm dot com  2009-09-27 09:56 -------
(In reply to comment #5)
> > 
> > "aligned to" refers to the offset misalignment and not to the misalignment of
> > base.
> Hmm, I believe it refers to base + offset + constant offset.
tree-data-refs.h:
  /* Alignment information.  ALIGNED_TO is set to the largest power of two
     that divides OFFSET.  */
  tree aligned_to;

tree-dat-refs.c:
DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));


> > attribute aligned works only for arrays, i.e., declarations, and not for
> > pointer arguments.
> I have to check that - I believe that in principle it should work.
> > For pointers the vectorizer only checks TYPE_ALIGN_UNIT of
> > the base type.
> That should be ok.  

But we need TYPE_ALIGN_UNIT to be 16, and we are checking scalar type here, so
without user defined alignment it will be 4.

Ira

> I guess I have to see what's going on here.
> Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (5 preceding siblings ...)
  2009-09-27  9:56 ` irar at il dot ibm dot com
@ 2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
  2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
  2010-02-04 20:29 ` bredelin at ucla dot edu
  8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-24 11:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2010-01-24 11:52 -------
*** Bug 42846 has been marked as a duplicate of this bug. ***


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bredelin at ucla dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (6 preceding siblings ...)
  2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
@ 2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
  2010-02-04 20:29 ` bredelin at ucla dot edu
  8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-24 12:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2010-01-24 12:08 -------
In the testcase from PR42846 one issue is that

        base_address: p__3(D)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *(const aligned_real * restrict) p__3(D)

only in the base object we see the cast to the aligned pointer, but it is
stripped for the base address in the innermost loop.

So in the end all this boils down to the Frontend / middle-end issue of
weak handling of aligned types.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
  2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
                   ` (7 preceding siblings ...)
  2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
@ 2010-02-04 20:29 ` bredelin at ucla dot edu
  8 siblings, 0 replies; 11+ messages in thread
From: bredelin at ucla dot edu @ 2010-02-04 20:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from bredelin at ucla dot edu  2010-02-04 20:29 -------
In reply to comment #8
> So in the end all this boils down to the Frontend / middle-end issue of
> weak handling of aligned types.

Would you mind giving a general idea of what the outlook for improvement on
this front is?

Also, this is interesting:
http://eigen.tuxfamily.org/index.php?title=Benchmark


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
       [not found] <bug-41464-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-13  0:09 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-13  0:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.9.0
      Known to work|                            |4.9.0
      Known to fail|                            |4.4.7, 4.5.3, 4.7.1, 4.8.1,
                   |                            |4.8.5
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
4.8.5 did:

        movaps  xmm0, xmm1
        add     ecx, 1
        movlps  xmm0, QWORD PTR [rdi+rax]
        movhps  xmm0, QWORD PTR [rdi+8+rax]
        mulps   xmm0, xmm2
        movlps  QWORD PTR [rsi+rax], xmm0
        movhps  QWORD PTR [rsi+8+rax], xmm0

But 4.9.0 has:

        movaps  xmm0, XMMWORD PTR [rbp+0+r9]
        add     r10d, 1
        mulps   xmm0, xmm1
        movups  XMMWORD PTR [rax+r9], xmm0


So all fixed for GCC 4.9.0.

4.9.0
  vect__11.13_94 = MEM[base: vectp_i.12_90, index: ivtmp.28_28, offset: 0B];
  vect__12.14_96 = vect__11.13_94 * { 3.1415927410125732421875e+0,
3.1415927410125732421875e+0, 3.1415927410125732421875e+0,
3.1415927410125732421875e+0 };
  MEM[base: vectp_f.17_97, index: ivtmp.28_28, offset: 0B] = vect__12.14_96;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-13  0:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-24 23:14 [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads nmiell at comcast dot net
2009-09-25  9:06 ` [Bug tree-optimization/41464] " rguenth at gcc dot gnu dot org
2009-09-25 17:12 ` nmiell at comcast dot net
2009-09-25 17:34 ` ubizjak at gmail dot com
2009-09-27  8:06 ` irar at il dot ibm dot com
2009-09-27  9:43 ` rguenther at suse dot de
2009-09-27  9:56 ` irar at il dot ibm dot com
2010-01-24 11:52 ` rguenth at gcc dot gnu dot org
2010-01-24 12:10 ` rguenth at gcc dot gnu dot org
2010-02-04 20:29 ` bredelin at ucla dot edu
     [not found] <bug-41464-4@http.gcc.gnu.org/bugzilla/>
2021-12-13  0:09 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).