[Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
@ 2011-10-30  9:10 ubizjak at gmail dot com
  2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
  2021-08-28 19:55 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2011-10-30  9:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918

             Bug #: 50918
           Summary: Unoptimal code for vec-shift by scalar for integer
                    (byte, short, long long) operands
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ubizjak@gmail.com


Following testcase:

--cut here--
#define N 8

short a[N] = { 1,2,3,4,10,20,30,90 };
short r[N];

void
test_var (int n)
{
  int i;

  for (i = 0; i < N; i++)
    r[i] = a[i] << n;
}

void
test_cst (void)
{
  int i;

  for (i = 0; i < N; i++)
    r[i] = a[i] << 3;
}
--cut here--

compiles to (-march=corei7 -O2 -ftree-vectorize):

test_var:
    movdqa    a(%rip), %xmm0
    movd    %edi, %xmm2
    pmovsxwd    %xmm0, %xmm1
    psrldq    $8, %xmm0
    pmovsxwd    %xmm0, %xmm0
    pslld    %xmm2, %xmm1
    pslld    %xmm2, %xmm0
    pshufb    .LC0(%rip), %xmm1
    pshufb    .LC1(%rip), %xmm0
    por    %xmm1, %xmm0
    movdqa    %xmm0, r(%rip)
    ret

test_cst:
    movdqa    a(%rip), %xmm0
    pmovsxwd    %xmm0, %xmm1
    psrldq    $8, %xmm0
    pmovsxwd    %xmm0, %xmm0
    pslld    $3, %xmm1
    pshufb    .LC0(%rip), %xmm1
    pslld    $3, %xmm0
    pshufb    .LC1(%rip), %xmm0
    por    %xmm1, %xmm0
    movdqa    %xmm0, r(%rip)
    ret

Why not psllw ?

The .optimized dump already shows:

test_var (int n)
{
  vector(8) short int vect_var_.16;
  vector(4) int vect_var_.15;
  vector(4) int vect_var_.14;
  vector(8) short int vect_var_.13;

<bb 2>:
  vect_var_.13_23 = MEM[(short int[8] *)&a];
  vect_var_.14_24 = [vec_unpack_lo_expr] vect_var_.13_23;
  vect_var_.14_25 = [vec_unpack_hi_expr] vect_var_.13_23;
  vect_var_.15_26 = vect_var_.14_24 << n_5(D);
  vect_var_.15_27 = vect_var_.14_25 << n_5(D);
  vect_var_.16_28 = VEC_PACK_TRUNC_EXPR <vect_var_.15_26, vect_var_.15_27>;
  MEM[(short int[8] *)&r] = vect_var_.16_28;
  return;

}


test_cst ()
{
  vector(8) short int vect_var_.36;
  vector(4) int vect_var_.35;
  vector(4) int vect_var_.34;
  vector(8) short int vect_var_.33;

<bb 2>:
  vect_var_.33_22 = MEM[(short int[8] *)&a];
  vect_var_.34_23 = [vec_unpack_lo_expr] vect_var_.33_22;
  vect_var_.34_24 = [vec_unpack_hi_expr] vect_var_.33_22;
  vect_var_.35_25 = vect_var_.34_23 << 3;
  vect_var_.35_26 = vect_var_.34_24 << 3;
  vect_var_.36_27 = VEC_PACK_TRUNC_EXPR <vect_var_.35_25, vect_var_.35_26>;
  MEM[(short int[8] *)&r] = vect_var_.36_27;
  return;

}

The same unoptimal code is generated for long-long and byte (-mxop target)
signed and unsigned arguments, left and right shifts. OTOH, int arguments
produce optimal code for left and right shifts:

test_var:
    movdqa    a(%rip), %xmm0
    movd    %edi, %xmm1
    pslld    %xmm1, %xmm0
    movdqa    %xmm0, r(%rip)
    ret

test_cst:
    movdqa    a(%rip), %xmm0
    pslld    $3, %xmm0
    movdqa    %xmm0, r(%rip)
    ret


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/50918] Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
  2011-10-30  9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
@ 2011-11-01 14:19 ` jakub at gcc dot gnu.org
  2021-08-28 19:55 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-11-01 14:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-01 14:18:47 UTC ---
I don't see that for long long we would create inefficient shifts, at least not
with -mavx2.
For the char/short shifts, I guess we'd want to add
vect_recog_narrow_shift_pattern that would recognize these.
For shifts by constant, I think it can be done always (if the constant is
bigger
than precision of the narrower type for left shifts and right unsigned shifts
we can just change it into clearing the destination (though the earlier
optimizers should have done that already), for right arithmetic shift
we could do x < 0 ? -1 : 0), for shifts by variable we'd need a target hook or
macro how vector shifts with larger or equal shift count than precision behave.
AFAIK i?86 (all vector shifts) DTRT for this (i.e. left/right unsigned shifts
for too big count store zero and right arithmetic shift stores copies of the
sign bit everywhere), but e.g. Altivec shifts truncate and thus can't be used.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/50918] Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
  2011-10-30  9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
  2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
@ 2021-08-28 19:55 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-28 19:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Since GCC 9 for test_cst, the following is produced:
        movdqa  a(%rip), %xmm0
        psllw   $3, %xmm0
        movaps  %xmm0, r(%rip)
        ret


For test_var, GCC produces the following on the trunk:
test_var:
        movdqa  a(%rip), %xmm0
        movslq  %edi, %rax
        movq    %rax, %xmm2
        pmovsxwd        %xmm0, %xmm1
        psrldq  $8, %xmm0
        pmovsxwd        %xmm0, %xmm0
        pslld   %xmm2, %xmm1
        pslld   %xmm2, %xmm0
        movdqa  .LC0(%rip), %xmm2
        pand    %xmm2, %xmm1
        pand    %xmm0, %xmm2
        movdqa  %xmm1, %xmm0
        packusdw        %xmm2, %xmm0
        movaps  %xmm0, r(%rip)
        ret

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-28 19:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-30  9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
2021-08-28 19:55 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).