public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114428] New: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld
@ 2024-03-22 5:33 liuhongt at gcc dot gnu.org
0 siblings, 0 replies; only message in thread
From: liuhongt at gcc dot gnu.org @ 2024-03-22 5:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428
Bug ID: 114428
Summary: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector
(0xffff x4) can be optimized to psrld
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Target Milestone: ---
typedef unsigned short uint16_t;
typedef short int16_t;
#define QUANT_ONE( coef, mf, f ) \
{ \
if( (coef) > 0 ) \
(coef) = (f + (coef)) * (mf) >> 16; \
else \
(coef) = - ((f - (coef)) * (mf) >> 16); \
nz |= (coef); \
}
int quant_4x4( int16_t dct[16], uint16_t mf[16], uint16_t bias[16] )
{
int nz = 0;
for( int i = 0; i < 16; i++ )
QUANT_ONE( dct[i], mf[i], bias[i] );
return !!nz;
}
gcc -O2 -march=x86-64-v3 -S
mov edx, 65535
vmovd xmm4, edx
vpbroadcastd ymm4, xmm4
...
vpsrad ymm2, ymm2, 16
vpsrad ymm6, ymm6, 16
vpsrad ymm0, ymm0, 16
vpand ymm2, ymm4, ymm2
vpsrad ymm1, ymm1, 16
vpand ymm6, ymm4, ymm6
vpand ymm0, ymm4, ymm0
vpand ymm4, ymm4, ymm1
vpackusdw ymm2, ymm2, ymm6
vpackusdw ymm0, ymm0, ymm4
vpermq ymm2, ymm2, 216
vpermq ymm0, ymm0, 216
...
it can be optimized to below.
vpsrld ymm2, ymm2, 16
vpsrld ymm6, ymm6, 16
vpsrld ymm0, ymm0, 16
vpsrld ymm1, ymm1, 16
vpackusdw ymm2, ymm2, ymm6
vpackusdw ymm0, ymm0, ymm4
vpermq ymm2, ymm2, 216
vpermq ymm0, ymm0, 216
The optimization opportunity is exposed after vec_pack_trunk_expr is expanded
to vpand + vpackusdw.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-03-22 5:33 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-22 5:33 [Bug target/114428] New: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).