* [Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb
2024-03-28 9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
@ 2024-03-28 23:09 ` pinskia at gcc dot gnu.org
2024-03-28 23:14 ` pinskia at gcc dot gnu.org
2024-03-29 1:03 ` liuhongt at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-28 23:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Last reconfirmed| |2024-03-28
CC| |pinskia at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
Note non sign bit can be improved too:
```
#define vector __attribute__((vector_size(16)))
typedef vector signed char v16qi;
typedef vector unsigned char v16uqi;
v16qi
foo2 (v16qi a, v16qi b)
{
return a >> 6;
}
v16uqi
foo1 (v16uqi a, v16uqi b)
{
return a >> 6;
}
```
clang produces:
```
_Z4foo2Dv16_aS_:
psrlw $6, %xmm0
pand .LCPI0_0(%rip), %xmm0 #{3,3,3,...}
movdqa .LCPI0_1(%rip), %xmm1 #{2,2,2,...}
pxor %xmm1, %xmm0
psubb %xmm1, %xmm0
retq
_Z4foo1Dv16_hS_:
psrlw $6, %xmm0
pand .LCPI1_0(%rip), %xmm0 #{3,3,3,...}
retq
```
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb
2024-03-28 9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
2024-03-28 23:09 ` [Bug target/114514] " pinskia at gcc dot gnu.org
@ 2024-03-28 23:14 ` pinskia at gcc dot gnu.org
2024-03-29 1:03 ` liuhongt at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-28 23:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For non constant clang produces:
```
signedshiftright:
movzbl %dil, %eax
movd %eax, %xmm1
psrlw %xmm1, %xmm0
pcmpeqd %xmm2, %xmm2
psrlw %xmm1, %xmm2
movdqa .LCPI0_0(%rip), %xmm3 # xmm3 =
[32896,32896,32896,32896,32896,32896,32896,32896]
psrlw %xmm1, %xmm3
psrlw $8, %xmm2
punpcklbw %xmm2, %xmm2 # xmm2 =
xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshuflw $0, %xmm2, %xmm1 # xmm1 = xmm2[0,0,0,0,4,5,6,7]
pshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0]
pand %xmm1, %xmm0
pxor %xmm3, %xmm0
psubb %xmm3, %xmm0
retq
unsignedshiftrtight:
movzbl %dil, %eax
movd %eax, %xmm1
psrlw %xmm1, %xmm0
pcmpeqd %xmm2, %xmm2
psrlw %xmm1, %xmm2
psrlw $8, %xmm2
punpcklbw %xmm2, %xmm2 # xmm2 =
xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshuflw $0, %xmm2, %xmm1 # xmm1 = xmm2[0,0,0,0,4,5,6,7]
pshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0]
pand %xmm1, %xmm0
retq
```
I am not sure which way is faster here though.
^ permalink raw reply [flat|nested] 4+ messages in thread