[Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb
@ 2024-03-28  9:38 liuhongt at gcc dot gnu.org
  2024-03-28 23:09 ` [Bug target/114514] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-03-28  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514

            Bug ID: 114514
           Summary: v16qi >> 7 can be optimized with vpcmpgtb
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

v16qi
foo2 (v16qi a, v16qi b)
{
    return a >> 7;
}

it can be optimized with
        vpxor   xmm1, xmm1, xmm1
        vpcmpgtb        xmm0, xmm1, xmm0
        ret

currently we generate(emulated with v16hi)

        movl    $16843009, %eax
        vpsraw  $7, %xmm0, %xmm0
        vmovd   %eax, %xmm1
        vpbroadcastd    %xmm1, %xmm1
        vpandn  %xmm1, %xmm0, %xmm0
        vpsubb  %xmm1, %xmm0, %xmm0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb
  2024-03-28  9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
@ 2024-03-28 23:09 ` pinskia at gcc dot gnu.org
  2024-03-28 23:14 ` pinskia at gcc dot gnu.org
  2024-03-29  1:03 ` liuhongt at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-28 23:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2024-03-28
                 CC|                            |pinskia at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

Note non sign bit can be improved too:
```
#define vector __attribute__((vector_size(16)))

typedef vector signed char v16qi;
typedef vector unsigned char v16uqi;

v16qi
foo2 (v16qi a, v16qi b)
{
    return a >> 6;
}
v16uqi
foo1 (v16uqi a, v16uqi b)
{
    return a >> 6;
}
```

clang produces:
```
_Z4foo2Dv16_aS_:
        psrlw   $6, %xmm0
        pand    .LCPI0_0(%rip), %xmm0 #{3,3,3,...}
        movdqa  .LCPI0_1(%rip), %xmm1 #{2,2,2,...}
        pxor    %xmm1, %xmm0
        psubb   %xmm1, %xmm0
        retq
_Z4foo1Dv16_hS_:
        psrlw   $6, %xmm0
        pand    .LCPI1_0(%rip), %xmm0 #{3,3,3,...}
        retq
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb
  2024-03-28  9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
  2024-03-28 23:09 ` [Bug target/114514] " pinskia at gcc dot gnu.org
@ 2024-03-28 23:14 ` pinskia at gcc dot gnu.org
  2024-03-29  1:03 ` liuhongt at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-28 23:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For non constant clang produces:
```
signedshiftright:
        movzbl  %dil, %eax
        movd    %eax, %xmm1
        psrlw   %xmm1, %xmm0
        pcmpeqd %xmm2, %xmm2
        psrlw   %xmm1, %xmm2
        movdqa  .LCPI0_0(%rip), %xmm3           # xmm3 =
[32896,32896,32896,32896,32896,32896,32896,32896]
        psrlw   %xmm1, %xmm3
        psrlw   $8, %xmm2
        punpcklbw       %xmm2, %xmm2            # xmm2 =
xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshuflw $0, %xmm2, %xmm1                # xmm1 = xmm2[0,0,0,0,4,5,6,7]
        pshufd  $0, %xmm1, %xmm1                # xmm1 = xmm1[0,0,0,0]
        pand    %xmm1, %xmm0
        pxor    %xmm3, %xmm0
        psubb   %xmm3, %xmm0
        retq

unsignedshiftrtight:
        movzbl  %dil, %eax
        movd    %eax, %xmm1
        psrlw   %xmm1, %xmm0
        pcmpeqd %xmm2, %xmm2
        psrlw   %xmm1, %xmm2
        psrlw   $8, %xmm2
        punpcklbw       %xmm2, %xmm2            # xmm2 =
xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
        pshuflw $0, %xmm2, %xmm1                # xmm1 = xmm2[0,0,0,0,4,5,6,7]
        pshufd  $0, %xmm1, %xmm1                # xmm1 = xmm1[0,0,0,0]
        pand    %xmm1, %xmm0
        retq
```

I am not sure which way is faster here though.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb
  2024-03-28  9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
  2024-03-28 23:09 ` [Bug target/114514] " pinskia at gcc dot gnu.org
  2024-03-28 23:14 ` pinskia at gcc dot gnu.org
@ 2024-03-29  1:03 ` liuhongt at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-03-29  1:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514

--- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> Confirmed.
> 
> Note non sign bit can be improved too:
> ```
I assume you're talking about broadcast from imm or directly from constant
pool. GCC chooses the former, with -Os we can also generate the later.
According to microbenchmark, the former is better. I also tries to disable
broadcasting from imm and test with stress-ng vecmath, the performance is
similar.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-29  1:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28  9:38 [Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb liuhongt at gcc dot gnu.org
2024-03-28 23:09 ` [Bug target/114514] " pinskia at gcc dot gnu.org
2024-03-28 23:14 ` pinskia at gcc dot gnu.org
2024-03-29  1:03 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).