public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog
@ 2024-05-10  3:57 liuhongt at gcc dot gnu.org
  2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-10  3:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

            Bug ID: 115021
           Summary: [14/15 regression] unnecessary spill for vpternlog
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

typedef signed char v16qi __attribute__ ((__vector_size__ (16)));
 v16qi foo (v16qi x) { return x >> 5; }

with -march=x86-64-v4 -O2,  GCC 13.2 generates

foo(signed char __vector(16)):
        mov     eax, 4
        vpsraw  xmm2, xmm0, 5
        vpbroadcastb    xmm1, eax
        mov     eax, 7
        vpbroadcastb    xmm3, eax
        vmovdqa xmm0, xmm1
        vpternlogd      xmm0, xmm2, xmm3, 120
        vpsubb  xmm0, xmm0, xmm1
        ret

GCC 14.1 generates

foo(signed char __vector(16)):
        mov     eax, 67372036
        vpsraw  xmm2, xmm0, 5
        vpbroadcastd    xmm1, eax
        mov     eax, 117901063
        vpbroadcastd    xmm3, eax
        vmovdqa xmm0, xmm1
        vmovdqa XMMWORD PTR [rsp-24], xmm3
        vpternlogd      xmm0, xmm2, XMMWORD PTR [rsp-24], 120
        vpsubb  xmm0, xmm0, xmm1
        ret

There's extra spill.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
@ 2024-05-10 12:35 ` rguenth at gcc dot gnu.org
  2024-05-10 15:55 ` roger at nextmovesoftware dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-10 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.2
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
  2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
@ 2024-05-10 15:55 ` roger at nextmovesoftware dot com
  2024-05-10 17:01 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-10 15:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
   Last reconfirmed|                            |2024-05-10
                 CC|                            |roger at nextmovesoftware dot com
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
I have a patch for x86 ternlog handling that changes the output for this
testcase (without the pending change to optimize V8QI shifts) to:
foo:    movl    $67372036, %eax
        vpsraw  $5, %xmm0, %xmm0
        vpbroadcastd    %eax, %xmm1
        vpternlogd      $108, .LC0(%rip), %xmm1, %xmm0
        vpsubb  %xmm1, %xmm0, %xmm0
        ret
        .align 16
.LC0:
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7
        .byte   7

which at least doesn't construct the vector with a broadcast, and then "spill"
it to the stack before reading it back from memory.   I've no idea if this is
optimal, but it's certainly better than the current "spill".

I'm curious about what has changed to make this code (register allocation)
regress since GCC 13.  It was a patch of mine that changed broadcastb to
broadcastd, but that shouldn't have affected reload/register preferencing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
  2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
  2024-05-10 15:55 ` roger at nextmovesoftware dot com
@ 2024-05-10 17:01 ` roger at nextmovesoftware dot com
  2024-05-20  7:42 ` lin1.hu at intel dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-10 17:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
Here's a reduced test case that should be unaffected by the pending changes to
how V8QI shifts are expanded.  Note that the final "t -= t4" is required to
convince the register allocator to "spill".

typedef signed char v16qi __attribute__ ((__vector_size__ (16)));
// sign-extend low 3 bits to a byte.
v16qi foo (v16qi x) {
    v16qi t7 = (v16qi){7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
    v16qi t4 = (v16qi){4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4};
    v16qi t = x & t7;
    t ^= t4;
    t -= t4;
    return t;
}

which produces:

foo:    movl    $67372036, %eax
        vmovdqa %xmm0, %xmm2
        vpbroadcastd    %eax, %xmm1
        movl    $117901063, %eax
        vpbroadcastd    %eax, %xmm3
        vmovdqa %xmm1, %xmm0
        vmovdqa %xmm3, -24(%rsp)
        vmovdqa -24(%rsp), %xmm4
        vpternlogd      $120, %xmm2, %xmm4, %xmm0
        vpsubb  %xmm1, %xmm0, %xmm0
        ret

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-05-10 17:01 ` roger at nextmovesoftware dot com
@ 2024-05-20  7:42 ` lin1.hu at intel dot com
  2024-05-20  7:49 ` liuhongt at gcc dot gnu.org
  2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: lin1.hu at intel dot com @ 2024-05-20  7:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

Hu Lin <lin1.hu at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lin1.hu at intel dot com

--- Comment #3 from Hu Lin <lin1.hu at intel dot com> ---
I found compiler allocates mem to the third source register of vpternlog in IRA
after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the
generate code will be 

  8         .cfi_startproc
  9         movl    $4, %eax
 10         vpsraw  $5, %xmm0, %xmm2
 11         vpbroadcastb    %eax, %xmm1
 12         movl    $7, %eax
 13         vpbroadcastb    %eax, %xmm3
 14         vmovdqa %xmm1, %xmm0
 15         vpternlogd      $120, %xmm3, %xmm2, %xmm0
 16         vmovdqa %xmm3, -24(%rsp)
 17         vpsubb  %xmm1, %xmm0, %xmm0
 18         ret

And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from v16qi
to v4si, leading to movv4si can't combine with the vpternlog in postreload, so
the result is what you see now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-05-20  7:42 ` lin1.hu at intel dot com
@ 2024-05-20  7:49 ` liuhongt at gcc dot gnu.org
  2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-20  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hu Lin from comment #3)
> I found compiler allocates mem to the third source register of vpternlog in
> IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the
> generate code will be 
> 
>   8         .cfi_startproc
>   9         movl    $4, %eax
>  10         vpsraw  $5, %xmm0, %xmm2
>  11         vpbroadcastb    %eax, %xmm1
>  12         movl    $7, %eax
>  13         vpbroadcastb    %eax, %xmm3
>  14         vmovdqa %xmm1, %xmm0
>  15         vpternlogd      $120, %xmm3, %xmm2, %xmm0
>  16         vmovdqa %xmm3, -24(%rsp)
>  17         vpsubb  %xmm1, %xmm0, %xmm0
>  18         ret
> 
> And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from
> v16qi to v4si, leading to movv4si can't combine with the vpternlog in
> postreload, so the result is what you see now.

To clarify: The extra spill is caused by r14-4944-gf55cdce3f8dd85,
r14-7026-g6a67fdcb3f0cc8 only causes an extra mov instruction(which is not a
big deal).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
  2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-05-20  7:49 ` liuhongt at gcc dot gnu.org
@ 2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-13 14:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021

--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
It's fixed by r15-1100-gec985bc97a0157

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-13 14:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10  3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
2024-05-10 15:55 ` roger at nextmovesoftware dot com
2024-05-10 17:01 ` roger at nextmovesoftware dot com
2024-05-20  7:42 ` lin1.hu at intel dot com
2024-05-20  7:49 ` liuhongt at gcc dot gnu.org
2024-06-13 14:14 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).