public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
@ 2011-10-30 9:10 ubizjak at gmail dot com
2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
2021-08-28 19:55 ` pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2011-10-30 9:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918
Bug #: 50918
Summary: Unoptimal code for vec-shift by scalar for integer
(byte, short, long long) operands
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: ubizjak@gmail.com
Following testcase:
--cut here--
#define N 8
short a[N] = { 1,2,3,4,10,20,30,90 };
short r[N];
void
test_var (int n)
{
int i;
for (i = 0; i < N; i++)
r[i] = a[i] << n;
}
void
test_cst (void)
{
int i;
for (i = 0; i < N; i++)
r[i] = a[i] << 3;
}
--cut here--
compiles to (-march=corei7 -O2 -ftree-vectorize):
test_var:
movdqa a(%rip), %xmm0
movd %edi, %xmm2
pmovsxwd %xmm0, %xmm1
psrldq $8, %xmm0
pmovsxwd %xmm0, %xmm0
pslld %xmm2, %xmm1
pslld %xmm2, %xmm0
pshufb .LC0(%rip), %xmm1
pshufb .LC1(%rip), %xmm0
por %xmm1, %xmm0
movdqa %xmm0, r(%rip)
ret
test_cst:
movdqa a(%rip), %xmm0
pmovsxwd %xmm0, %xmm1
psrldq $8, %xmm0
pmovsxwd %xmm0, %xmm0
pslld $3, %xmm1
pshufb .LC0(%rip), %xmm1
pslld $3, %xmm0
pshufb .LC1(%rip), %xmm0
por %xmm1, %xmm0
movdqa %xmm0, r(%rip)
ret
Why not psllw ?
The .optimized dump already shows:
test_var (int n)
{
vector(8) short int vect_var_.16;
vector(4) int vect_var_.15;
vector(4) int vect_var_.14;
vector(8) short int vect_var_.13;
<bb 2>:
vect_var_.13_23 = MEM[(short int[8] *)&a];
vect_var_.14_24 = [vec_unpack_lo_expr] vect_var_.13_23;
vect_var_.14_25 = [vec_unpack_hi_expr] vect_var_.13_23;
vect_var_.15_26 = vect_var_.14_24 << n_5(D);
vect_var_.15_27 = vect_var_.14_25 << n_5(D);
vect_var_.16_28 = VEC_PACK_TRUNC_EXPR <vect_var_.15_26, vect_var_.15_27>;
MEM[(short int[8] *)&r] = vect_var_.16_28;
return;
}
test_cst ()
{
vector(8) short int vect_var_.36;
vector(4) int vect_var_.35;
vector(4) int vect_var_.34;
vector(8) short int vect_var_.33;
<bb 2>:
vect_var_.33_22 = MEM[(short int[8] *)&a];
vect_var_.34_23 = [vec_unpack_lo_expr] vect_var_.33_22;
vect_var_.34_24 = [vec_unpack_hi_expr] vect_var_.33_22;
vect_var_.35_25 = vect_var_.34_23 << 3;
vect_var_.35_26 = vect_var_.34_24 << 3;
vect_var_.36_27 = VEC_PACK_TRUNC_EXPR <vect_var_.35_25, vect_var_.35_26>;
MEM[(short int[8] *)&r] = vect_var_.36_27;
return;
}
The same unoptimal code is generated for long-long and byte (-mxop target)
signed and unsigned arguments, left and right shifts. OTOH, int arguments
produce optimal code for left and right shifts:
test_var:
movdqa a(%rip), %xmm0
movd %edi, %xmm1
pslld %xmm1, %xmm0
movdqa %xmm0, r(%rip)
ret
test_cst:
movdqa a(%rip), %xmm0
pslld $3, %xmm0
movdqa %xmm0, r(%rip)
ret
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/50918] Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
2011-10-30 9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
@ 2011-11-01 14:19 ` jakub at gcc dot gnu.org
2021-08-28 19:55 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-11-01 14:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |irar at gcc dot gnu.org,
| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-01 14:18:47 UTC ---
I don't see that for long long we would create inefficient shifts, at least not
with -mavx2.
For the char/short shifts, I guess we'd want to add
vect_recog_narrow_shift_pattern that would recognize these.
For shifts by constant, I think it can be done always (if the constant is
bigger
than precision of the narrower type for left shifts and right unsigned shifts
we can just change it into clearing the destination (though the earlier
optimizers should have done that already), for right arithmetic shift
we could do x < 0 ? -1 : 0), for shifts by variable we'd need a target hook or
macro how vector shifts with larger or equal shift count than precision behave.
AFAIK i?86 (all vector shifts) DTRT for this (i.e. left/right unsigned shifts
for too big count store zero and right arithmetic shift stores copies of the
sign bit everywhere), but e.g. Altivec shifts truncate and thus can't be used.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/50918] Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands
2011-10-30 9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
@ 2021-08-28 19:55 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-28 19:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50918
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Since GCC 9 for test_cst, the following is produced:
movdqa a(%rip), %xmm0
psllw $3, %xmm0
movaps %xmm0, r(%rip)
ret
For test_var, GCC produces the following on the trunk:
test_var:
movdqa a(%rip), %xmm0
movslq %edi, %rax
movq %rax, %xmm2
pmovsxwd %xmm0, %xmm1
psrldq $8, %xmm0
pmovsxwd %xmm0, %xmm0
pslld %xmm2, %xmm1
pslld %xmm2, %xmm0
movdqa .LC0(%rip), %xmm2
pand %xmm2, %xmm1
pand %xmm0, %xmm2
movdqa %xmm1, %xmm0
packusdw %xmm2, %xmm0
movaps %xmm0, r(%rip)
ret
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-28 19:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-30 9:10 [Bug tree-optimization/50918] New: Unoptimal code for vec-shift by scalar for integer (byte, short, long long) operands ubizjak at gmail dot com
2011-11-01 14:19 ` [Bug tree-optimization/50918] " jakub at gcc dot gnu.org
2021-08-28 19:55 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).