[Bug tree-optimization/58497] New: SLP vectorizes identical operations

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/58497] New: SLP vectorizes identical operations
@ 2013-09-22  7:04 glisse at gcc dot gnu.org
  2013-09-23  8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-09-22  7:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

            Bug ID: 58497
           Summary: SLP vectorizes identical operations
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org

typedef float float4 __attribute__((vector_size(16)));

float4 g(int x)
{
  float4 W;
  W[0]=W[1]=W[2]=W[3]=x+1;
  return W;
}

is vectorized by SLP to:

  vect_cst_.4_11 = {x_1(D), x_1(D), x_1(D), x_1(D)};
  vect__2.3_13 = vect_cst_.4_11 + { 1, 1, 1, 1 };
  vect__3.6_14 = (vector(4) floatD.38) vect__2.3_13;

Maybe when a vector is really the same scalar copied into all slots it would be
better not to turn the scalar ops into vector ops? (turning the 4 BIT_FIELD_REF
writes into a constructor is still good though)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/58497] SLP vectorizes identical operations
  2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
@ 2013-09-23  8:33 ` rguenth at gcc dot gnu.org
  2013-09-23  9:03 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-23  8:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-09-23
         Depends on|                            |53947
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Heh ;)  I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't
vectorize this at all.

Note that with for example

typedef float float4 __attribute__((vector_size(16)));

float4 g(int x)
{
  float4 W;
  W[0]=W[1]=x+1;
  W[2]=x+2;
  W[3]=x+3;
  return W;
}

vectorizing two same operations may be profitable.  But yes, if all
scalars are the same there is no point to do it.  And the cost model
should have disabled it as well (though likely the four "stores"
made it profitable in the end).

I will have a look at some point.

OTOH generated code is

g:
.LFB0:
        .cfi_startproc
        movl    %edi, -12(%rsp)
        movd    -12(%rsp), %xmm1
        pshufd  $0, %xmm1, %xmm0
        paddd   .LC0(%rip), %xmm0
        cvtdq2ps        %xmm0, %xmm0
        ret

vs. -fno-tree-vectorize:

g:
.LFB0:
        .cfi_startproc
        xorps   %xmm1, %xmm1
        addl    $1, %edi
        xorps   %xmm0, %xmm0
        cvtsi2ss        %edi, %xmm1
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $36, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        movss   %xmm1, %xmm2
        shufps  $196, %xmm2, %xmm0
        movaps  %xmm0, %xmm2
        unpcklps        %xmm0, %xmm0
        movss   %xmm1, %xmm0
        shufps  $225, %xmm2, %xmm0
        movss   %xmm1, %xmm0
        ret

so clearly a win, but improvable to sth like

        addl    $1, %edi
        cvtsi2ss        %edi, %xmm1
        pshufd  $0, %xmm1, %xmm0

the above also shows that vector init by BIT_FIELD_REF is not expanded
very well (sth for a generalized vector shuffle recognition in the bswap pass).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/58497] SLP vectorizes identical operations
  2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
  2013-09-23  8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
@ 2013-09-23  9:03 ` rguenth at gcc dot gnu.org
  2015-10-22 13:37 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-23  9:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 30884
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30884&action=edit
prototype patch

A quick check shows generated code will be

g:
.LFB0:
        .cfi_startproc
        xorps   %xmm0, %xmm0
        addl    $1, %edi
        cvtsi2ss        %edi, %xmm0
        shufps  $0, %xmm0, %xmm0
        ret

and the patch shows possible issues with finding an insert location for
the init stmt (otherwise "external" is just outside of the current
basic-block).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/58497] SLP vectorizes identical operations
  2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-10-22 13:37 ` rguenth at gcc dot gnu.org
@ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org
  2021-08-14 23:27 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173

URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev
Log:
2015-10-22  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/58497
        * tree-vect-generic.c (ssa_uniform_vector_p): New helper.
        (expand_vector_operations_1): Use it.  Lower operations on
        all uniform vectors to scalar operations if the HW supports it.

        * gcc.dg/tree-ssa/vector-5.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-generic.c

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/58497] SLP vectorizes identical operations
  2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
  2013-09-23  8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
  2013-09-23  9:03 ` rguenth at gcc dot gnu.org
@ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org
  2015-10-22 13:37 ` rguenth at gcc dot gnu.org
  2021-08-14 23:27 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173

URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev
Log:
2015-10-22  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/58497
        * tree-vect-generic.c (ssa_uniform_vector_p): New helper.
        (expand_vector_operations_1): Use it.  Lower operations on
        all uniform vectors to scalar operations if the HW supports it.

        * gcc.dg/tree-ssa/vector-5.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-generic.c

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/58497] SLP vectorizes identical operations
  2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-10-22 13:37 ` rguenth at gcc dot gnu.org
@ 2021-08-14 23:27 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-14 23:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine for GCC 13, I have patches which turn:

  W_6 = BIT_INSERT_EXPR <W_5(D), _2, 96 (32 bits)>;
  W_7 = BIT_INSERT_EXPR <W_6, _2, 64 (32 bits)>;
  W_8 = BIT_INSERT_EXPR <W_7, _2, 32 (32 bits)>;
  W_9 = BIT_INSERT_EXPR <W_8, _2, 0 (32 bits)>;
Into:
W_9 = {_2,_2,_2,_2};

This improvement deals with bitfields but vectors have a similar issue with
Bit_inserts so I deal with it there.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-14 23:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-22  7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
2013-09-23  8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
2013-09-23  9:03 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2021-08-14 23:27 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).